Python Programming for Biology: Bioinformatics and Beyond

External macromolecular structure modules

Download 7,75 Mb.

Pdf ko'rish

bet	232/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 228 229 230 231 232 233 234 235 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Structures in BioPython

External macromolecular structure modules

We will now look briefly at alternative ways of using macromolecular data in Python. The

first of these is the PDB module of BioPython, which provides a more capable and tested

alternative to our very simple structure data model. The second is the graphical program

PyMol, which can be imported as a Python module so that coordinate data can be rendered

as pretty pictures. (See

http://www.cambridge.org/pythonforbiology

for download links to

BioPython and PyMol.)

Structures in BioPython

The PDB sub-module of BioPython contains functionality to read, write, manipulate and

investigate macromolecular structure data. In the following example, after loading the

PDB module, we make a parser object that can make a PDB.Structure object using the

data from a file. Although this method may seem a little clunky compared to our earlier

examples of a single function that can be used to import the data, having an intermediate

object does allow the programmer to have more line-by-line control of the file reading.

from Bio import PDB

fileName = 'examples/1UST.pdb'

parser = PDB.PDBParser()

struc = parser.get_structure('Name', fileName)

Once furnished with the main structure object, we can then extract the first set of

coordinates (first conformation) from the structure and loop through all of the chains,

residues and atoms to get at the coordinates in a convenient manner, noting that the names

of object attributes differ from the previous data model:

conformation = struc[0]

for chain in conformation:

for residue in chain:

atomNames = [a.name for a in residue]

print(chain.id, residue.id[1], residue.resname, atomNames)

caAtom = residue['CA']

print(caAtom.name, caAtom.coord, caAtom.bfactor)

Writing the data to disk requires calling a function to define a special object (called

writer here) which can be used to save the file:

outFileName = 'test.pdb'

writer = PDB.PDBIO()

writer.set_structure(struc)

writer.save(outFileName)

Referring back to one of the early examples we gave in this chapter, to find the atoms

within a given distance of a specified point, we can write an equivalent function that uses

the BioPython PDB.Structure.Structure object, rather than the simple Structure object we

defined in

Chapter 8

. The mathematics of the function are the same as in

findCloseAtoms(), but the collection of coordinates and atom objects is naturally different

because of the different data model. Note that in this example we have been a little more

rigorous and check, using the inbuilt isinstance() function, that the input object is of the

required type: PDB.Structure.Structure. Looping through the chains, residues and atoms is

fairly straightforward, although the data model is slightly different. For example, here we

use a conformation (coordinate model) number to specify which coordinate set to use

within the structure object; our bespoke model is simpler and loads alternative

conformations as entirely separate objects.

def findCloseAtomsBioPy(structure, xyz, conf=0, limit=5.0):

if not isinstance(structure, PDB.Structure.Structure):

raise Exception('Structure must be Bio.PDB.Structure class')

closeAtoms = []

xyz = array(xyz)

limit2 = limit * limit

coords = []

atoms = []

confModel = structure[conf]

for chain in confModel:

for residue in chain:

for atom in residue:

coords.append(atom.coord)

atoms.append(atom)

deltas = coords - xyz

squares = deltas * deltas

sumSquares = squares.sum(axis=1)

boolArray = sumSquares < limit2

indices = boolArray.nonzero()[0]

closeAtoms = [atoms[i] for i in indices]

return closeAtoms

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 228 229 230 231 232 233 234 235 ... 514