Python Programming for Biology: Bioinformatics and Beyond


Using Python for macromolecular structures



Download 7,75 Mb.
Pdf ko'rish
bet221/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   217   218   219   220   221   222   223   224   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Using Python for macromolecular structures


In  the  following  Python  examples  we  will  mostly  examine  and  manipulate  existing

structural data, i.e. the coordinates of atoms. The idea is that you should become familiar

with  how  to  handle  structural  information.  We  deliberately  avoid  going  into  the

computational aspects of how to determine structures in the first place. We will leave such

vast and specialist topics to your future diligence.

Obtaining structure data

Before  we  can  begin  to  manipulate  macromolecular  structure  data  we  must  initially  get

hold of the coordinate information. Firstly, if you are using the downloadable material that

goes with this book

8

there will be a few example files of structures saved in Protein Data



Bank  (PDB)  file  format.  Alternatively,  we  could  use  the  power  of  Python  to  download

data directly from the PDB website’s download service. The following code achieves this

by making use of the urllib module (in Python 3; urllib2 in Python 2), which is a standard

part of a Python installation. This module will do all the hard work and we will use it to

send  a  request  to  the  PDB  web  service,  the  response  to  which  will  be  a  plain  text  file

containing  the  required  structural  data,  and  all  we  need  to  do  is  specify  the  identifier

(pdbId ) of the entry that we wish to download.

Initially,  we  import  the  web-handling  urlopen()  function.  The  module  this  resides  in

changed from Python 2 to Python 3 so we first try the Python 3 form and if that does not

work  then  try  the  Python  2  form,  using  a  try  /  except  (you  could  also  check  whether

sys.version[0] is ‘3’).

try:


# Python 3

from urllib.request import urlopen

except ImportError:

# Python 2

from urllib2 import urlopen

Then  define  a  Python  string  that  contains  the  URL  where  the  PDB  data  can  be

downloaded  from,  noting  that  it  is  a  formatted  string  template  with  %s  indicating  where

the database identifier will be inserted.

PDB_URL = 'http://www.rcsb.org/pdb/cgi/export.cgi/' \

'%s.pdb?format=PDB&compression=None'

The function is then defined, and accepts an identifier and an optional file name, where

the  PDB  data  will  be  saved,  as  arguments.  If  no  file  name  is  specified  (or  conditionally

evaluates to False, like an empty string) then the file name is specified by adding ‘.pdb’ to

the database identifier.

def downloadPDB(pdbId, fileName=None):

if not fileName:

fileName = '%s.pdb' % pdbId

response = urlopen(PDB_URL % pdbId)

data = response.read().decode('utf-8')



fileObj = open(fileName, 'w')

fileObj.write(data)

fileObj.close()

return fileName

We use the web-reading urlopen() function to generate what is called a response object.

This  object  is  then  used  to  fetch  the  PDB  file  into  a  string  using  the  read()  function.  In

Python 3 this comes back as bytes, not as a string, and in order to be able to write it to a

file  it  needs  to  be  converted  to  a  string  via  a  decoding,  here  using  UTF-8.  This  extra

decoding step is not needed in Python 2. This string is then simply written to file. The file

name that was used is then returned at the end. Note that if you were to use this function

regularly it would be advisable to add a few checks, just in case things go wrong; check

that the URL query really worked and maybe warn the user if attempting to overwrite an

existing file. The function is easily tested, in this case to generate a file with a defaulted

name of ‘1A12.pdb’.

fileName = downloadPDB('1A12')

For most of the subsequent examples we will be working with the simple structure data

model that was described in

Chapter 8

and which is available with the web material in the

Modelling.py file. Hence, to be able to test these functions, you will need to load the PDB

file data into our Structure class of objects as illustrated below:

from Modelling import getStructuresFromFile

strucObjs = getStructuresFromFile(fileName)

Of  course  our  data  model  and  object  classes  for  macromolecular  structure  are  fairly

simple, so they can be used as examples in this book. If you require a more complex but

comprehensive  set  of  objects,  the  Bio.PDB  modules  in  BioPython  can  be  used  as  an

alternative. Some of the basics of these modules are described briefly towards the end of

this chapter.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   217   218   219   220   221   222   223   224   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish