In the following Python examples we will mostly examine and manipulate existing
structural data, i.e. the coordinates of atoms. The idea is that you should become familiar
with how to handle structural information. We deliberately avoid going into the
computational aspects of how to determine structures in the first place. We will leave such
vast and specialist topics to your future diligence.
Obtaining structure data
Before we can begin to manipulate macromolecular structure data we must initially get
hold of the coordinate information. Firstly, if you are using the downloadable material that
goes with this book
8
there will be a few example files of structures saved in Protein Data
Bank (PDB) file format. Alternatively, we could use the power of Python to download
data directly from the PDB website’s download service. The following code achieves this
by making use of the urllib module (in Python 3; urllib2 in Python 2), which is a standard
part of a Python installation. This module will do all the hard work and we will use it to
send a request to the PDB web service, the response to which will be a plain text file
containing the required structural data, and all we need to do is specify the identifier
(pdbId ) of the entry that we wish to download.
Initially, we import the web-handling urlopen() function. The module this resides in
changed from Python 2 to Python 3 so we first try the Python 3 form and if that does not
work then try the Python 2 form, using a try / except (you could also check whether
sys.version[0] is ‘3’).
try:
# Python 3
from urllib.request import urlopen
except ImportError:
# Python 2
from urllib2 import urlopen
Then define a Python string that contains the URL where the PDB data can be
downloaded from, noting that it is a formatted string template with %s indicating where
the database identifier will be inserted.
PDB_URL = 'http://www.rcsb.org/pdb/cgi/export.cgi/' \
'%s.pdb?format=PDB&compression=None'
The function is then defined, and accepts an identifier and an optional file name, where
the PDB data will be saved, as arguments. If no file name is specified (or conditionally
evaluates to False, like an empty string) then the file name is specified by adding ‘.pdb’ to
the database identifier.
def downloadPDB(pdbId, fileName=None):
if not fileName:
fileName = '%s.pdb' % pdbId
response = urlopen(PDB_URL % pdbId)
data = response.read().decode('utf-8')
fileObj = open(fileName, 'w')
fileObj.write(data)
fileObj.close()
return fileName
We use the web-reading urlopen() function to generate what is called a response object.
This object is then used to fetch the PDB file into a string using the read() function. In
Python 3 this comes back as bytes, not as a string, and in order to be able to write it to a
file it needs to be converted to a string via a decoding, here using UTF-8. This extra
decoding step is not needed in Python 2. This string is then simply written to file. The file
name that was used is then returned at the end. Note that if you were to use this function
regularly it would be advisable to add a few checks, just in case things go wrong; check
that the URL query really worked and maybe warn the user if attempting to overwrite an
existing file. The function is easily tested, in this case to generate a file with a defaulted
name of ‘1A12.pdb’.
fileName = downloadPDB('1A12')
For most of the subsequent examples we will be working with the simple structure data
model that was described in
Chapter 8
and which is available with the web material in the
Modelling.py file. Hence, to be able to test these functions, you will need to load the PDB
file data into our Structure class of objects as illustrated below:
from Modelling import getStructuresFromFile
strucObjs = getStructuresFromFile(fileName)
Of course our data model and object classes for macromolecular structure are fairly
simple, so they can be used as examples in this book. If you require a more complex but
comprehensive set of objects, the Bio.PDB modules in BioPython can be used as an
alternative. Some of the basics of these modules are described briefly towards the end of
this chapter.
Do'stlaringiz bilan baham: