We will now look briefly at alternative ways of using macromolecular data in Python. The
first of these is the PDB module of BioPython, which provides a more capable and tested
alternative to our very simple structure data model. The second is the graphical program
PyMol, which can be imported as a Python module so that coordinate data can be rendered
as pretty pictures. (See
The PDB sub-module of BioPython contains functionality to read, write, manipulate and
investigate macromolecular structure data. In the following example, after loading the
PDB module, we make a parser object that can make a PDB.Structure object using the
data from a file. Although this method may seem a little clunky compared to our earlier
examples of a single function that can be used to import the data, having an intermediate
object does allow the programmer to have more line-by-line control of the file reading.
Once furnished with the main structure object, we can then extract the first set of
coordinates (first conformation) from the structure and loop through all of the chains,
residues and atoms to get at the coordinates in a convenient manner, noting that the names
conformation = struc[0]
for chain in conformation:
for residue in chain:
atomNames = [a.name for a in residue]
print(chain.id, residue.id[1], residue.resname, atomNames)
caAtom = residue['CA']
print(caAtom.name, caAtom.coord, caAtom.bfactor)
Writing the data to disk requires calling a function to define a special object (called
writer here) which can be used to save the file:
outFileName = 'test.pdb'
writer = PDB.PDBIO()
writer.set_structure(struc)
writer.save(outFileName)
Referring back to one of the early examples we gave in this chapter, to find the atoms
within a given distance of a specified point, we can write an equivalent function that uses
the BioPython PDB.Structure.Structure object, rather than the simple Structure object we
defined in
Chapter 8
. The mathematics of the function are the same as in
findCloseAtoms(), but the collection of coordinates and atom objects is naturally different
because of the different data model. Note that in this example we have been a little more
rigorous and check, using the inbuilt isinstance() function, that the input object is of the
required type: PDB.Structure.Structure. Looping through the chains, residues and atoms is
fairly straightforward, although the data model is slightly different. For example, here we
use a conformation (coordinate model) number to specify which coordinate set to use
within the structure object; our bespoke model is simpler and loads alternative
conformations as entirely separate objects.
def findCloseAtomsBioPy(structure, xyz, conf=0, limit=5.0):
if not isinstance(structure, PDB.Structure.Structure):
raise Exception('Structure must be Bio.PDB.Structure class')
closeAtoms = []
xyz = array(xyz)
limit2 = limit * limit
coords = []
atoms = []
confModel = structure[conf]
for chain in confModel:
for residue in chain:
for atom in residue:
coords.append(atom.coord)
atoms.append(atom)
deltas = coords - xyz
squares = deltas * deltas
sumSquares = squares.sum(axis=1)
boolArray = sumSquares < limit2
indices = boolArray.nonzero()[0]
closeAtoms = [atoms[i] for i in indices]
return closeAtoms
Do'stlaringiz bilan baham: