Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet80/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   76   77   78   79   80   81   82   83   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Reading PDB files

PDB  (Protein  Data  Bank)  files  were  invented  in  the  1970s  to  describe  the  three-




dimensional  coordinates  of  biological  macromolecules.  As  the  name  suggests  this  was

initially  designed  for  proteins,  but  the  same  system  is  now  commonly  used  to  represent

DNA, RNA, carbohydrates, lipids, small molecules and any other biologically important

molecule.  PDB  files  can  contain  the  description  of  multiple  molecules  and  multiple

structures, and can hold lots of other descriptive information. However, in this section we

ignore  all  the  complexities  and  concentrate  only  on  the  parts  that  specify  the  spatial

coordinates.  A  PDB  file  is  both  key/value  and  line  oriented,  with  the  key  at  the  start  of

each line giving context to the data in the remainder of the line. The coordinates we are

interested in are in records where the line starts with the six characters ‘ATOM ’ (with two

spaces  at  the  end),  which  can  be  thought  of  as  the  key.  The  x  coordinate  is  given  in

columns 30 to 37, y in columns 38 to 45 and z in columns 46 to 53, assuming that the first

column is column 0.

The  following  example  reads  a  PDB  file  to  calculate  the  centroid  of  a  structure,  the

average  position  of  the  atoms.  Strictly  speaking,  this  should  be  biased  by  the  weight  of

each  atom,  but  we  ignore  that  issue  here  (and  in  practice  it  does  not  make  much  of  a

difference). In a drawing application, if you rotate a molecule on the screen, it is generally

desired to rotate it about the centroid, otherwise the rotation looks odd.

The function takes the name of the PDB file as an argument, and returns the number of

atoms found as well as the average x, y and z positions. As a PDB reader the function is

very simple and naïve, and in any serious program you would do best to use an existing

and tested function, like the one in the BioPython module. Nonetheless, the function will

serve to illustrate the principles involved.

Initially  we  open  the  file  object,  read  all  of  the  lines  and  then  immediately  close  it.

Next,  variables  representing  the  numbers  of  atoms  and  the  totals  for  the  x,  y  and  z

coordinates are initialised to zero, before looping though each of the lines. If a line begins

with the desired ‘ATOM ’  key  the  atom  count  is  increased,  the  coordinates  are  extracted

and the coordinate totals are increased. The coordinate data is initially just text characters

from  the  file  and  needs  to  be  converted  to  Python  numbers  (which  can  be  added

numerically). The Python float() performs the conversion from test string to floating point

number. So, for example, the string ‘12.572‘ would be converted to the number 12.572.

def calcCentroid(pdbFile):

fileObj = open(pdbFile, 'rU')

natoms = 0

xsum = ysum = zsum = 0

for line in fileObj:

if line[:6] == 'ATOM ':

natoms = + = 1

x = float(line[30:38])

y = float(line[38:46])

z = float(line[46:54])

xsum += x

ysum += y

zsum += z



fileObj.close()

if natoms == 0:

xavg = yavg = zavg = 0

else:


xavg = xsum / natoms

yavg = ysum / natoms

zavg = zsum / natoms

return (natoms, xavg, yavg, zavg)

Once  the  looping  is  done  and  the  additions  are  complete,  the  averages  are  defined  by

dividing the summation of each coordinate type by the total number of atoms. Note that if

the PDB file has no atom records the averages are simply set to zero, and we cannot divide

by zero in any case. The function is then readily tested:

print(calcCentroid('examples/protein.pdb'))

Of course it’s possible that someone calls the calcCentroid() function with an argument

that is not a PDB file, or even a file that does not exist. If the file does not exist, or you do

not have permission to read it, then the function will throw a standard Python exception

(IOError) when it tries to open it. If the file exists but is not a PDB file then most likely

there will be no lines starting with the text ‘ATOM ’ and so the function will just return the

tuple (0, 0, 0, 0). It’s also possible in this case that there is a line starting with ‘ATOM  ’

(by coincidence) but it does not have three floating point numbers in columns 30 through

53,  in  which  case  a  standard  Python  exception  (ValueError)  will  be  thrown  when  the

float() function is called.

There is always a question as to how you deal with bad input to a function. There is no

perfect answer. Sometimes you might want to throw standard Python exceptions. In other

cases you might want to check for conditions that might lead to an exception and instead

return some sensible default. Alternatively you might want to throw your own exception

to give a more informative warning to the user, rather than the standard Python one. It is a

matter of taste and circumstance.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   76   77   78   79   80   81   82   83   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish