Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet77/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   73   74   75   76   77   78   79   80   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

File reading examples

Reading whitespace-separated files

For  our  first  practical  example  we  will  begin  with  reading  a  simple  yet  commonly  used

kind of file, one where each line has several fields that are separated with whitespace. By

‘whitespace’  we  mean  tab  stops  (‘\t’)  or  one  or  more  spaces.  An  example  of  such  a  file

would be the following, where we first have a descriptive header line and then subsequent

lines with three text fields; the first is the name of a chromosome, the second is a base-pair

position  in  the  chromosome  and  the  last  is  a  value  representing  an  experimentally

determined value for that position:

chromosome position value

chr1 3417953 0.74634

chrX 152662801 0.50036

chr7 55281536 0.82376

chr4 9168943 0.73375

chr1 13170641 0.42181

For the purposes of our example we will assume that the above lines are in a file called

‘chromoData.tsv’  which  lies  in  the  ‘examples’  sub-directory  of  the  current  working

directory,  where  ‘.tsv’  gives  a  hint  that  the  format  is  tab-separated  values.  In  order  to

process  this  file  we  will  first  read  the  separate  header  line  with  .readline(),  given  that  it

doesn’t contain data we are interested in. Then we will loop through the remainder of the

lines,  by  iterating  over  the  file  object,  and  for  each  line  we  will  use  the  string  function

split()  to  separate  the  line  into  a  list  of  substrings.  Without  any  arguments  split()  will

separate  the  fields  according  to  whitespace,  which  is  what  we  want.  For  a  different  file

format we could specify a different separator, so, for example, for comma-separated fields

we  would  use  split(‘,’)  or  for  tab-separated  fields  split(‘\t’),  both  of  which  can

accommodate data items with internal spaces.

fileObj = open('examples/chromoData.tsv')

values = []

header = fileObj.readline() # Don't need this first line

for line in fileObj:

data = line.split()

chromosome, position, value = data

position = int(position)

value = float(value)

values.append(value)




mean = sum(values)/len(values)

print('Mean value', mean)

For  each  line  we  obtain  a  list  with  three  items  and  these  are  extracted  into  separate

chromosome,  position  and  value  variables.  Initially  these  will  be  text  strings,  given  that

they  were  just  read  from  the  file,  but  in  the  case  of  the  position  and  value  we  generally

want  to  convert  them  from  strings  into  integer  and  floating  point  number  data  types

respectively (though in this simple example we have not used the position).  Accordingly

we use the int() and float()  functions  to  do  the  conversion.  Once  a  variable  is  a  numeric

data  type  we  can  then  perform  mathematical  operations,  like  finding  the  mean  value  as

illustrated.

We  will  consider  field-delimited  formats  again  in  the  readListFile()  function  below,

where  we  handle  things  in  a  more  general  way,  allowing  different  data  type  conversion

functions and field separators to be specified as function arguments.


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   73   74   75   76   77   78   79   80   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish