Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	225/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 221 222 223 224 225 226 227 228 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Structural subsets

Next we will consider dissection of molecular structures into smaller parts. This sort of

thing is done in many instances. You may, for example, want to remove a flexible region

from the analysis of your molecule. Alternatively, you might want to select only a certain

kind of residue or certain kinds of atoms. The latter may be done to define the backbone

path of the molecular chain, which is useful when comparing structures with dissimilar

sequences.

The example Python function we describe makes a subset of a structure by making a

restricted copy of another structure, including only the atoms which are required.

Alternative methodologies might be to remove atoms from an existing structure, or only

load certain atoms in the first place, and these approaches may save a bit of computer

memory. Firstly we import the definitions of the classes of structural objects we wish to

make:

from Modelling import Structure, Chain, Residue, Atom

A function is then defined which takes an input structure and three other, optional,

arguments that specify which chains, residues and atoms to consider. If any of these

arguments is not specified (so defaults to None), it is taken to mean that no filtering is

done for that kind of component and all are included. The chainCodes argument is

assumed to be a collection of letter codes, e.g. [‘A’, ‘B’], the residueIds is assumed to be a

collection of residue numbers and atomNames, as you might expect, a collection of atom

names. You can use any of the common Python collection types here, list, tuple or set,

although these will be converted to sets using set() to remove repeats and give the best

speed performance.

def filterSubStructure(structure, chainCodes=None,

residueIds=None, atomNames=None):

Within the function we determine a name for the new Structure object we are going to

make by using the template Structure object’s name, and then adding ‘_filter’ plus other

strings that list which chain codes, residue numbers (converted to strings) and atom names

we have selected. Note how we first check to see if a chain, residue or atom specification

was defined (not None, and hence true) before the name is extended.

name = structure.name + '_filter'

if chainCodes:

name += ' ' + ','.join(chainCodes)

chainCodes = set(chainCodes)

if residueIds:

name += ' ' + ','.join([str(x) for x in residueIds])

residueIds = set(residueIds)

if atomNames:

name += ' ' + ','.join(atomNames)

atomNames = set(atomNames)

Next the class definition for Structure is used to make a new instance of that kind of

object, which we refer to as filterStruc. Although we defined a new name for this new

object, we keep the conformation number and PDB identifier from the original; these

indicate the origin of the data, and have not changed.

conf = structure.conformation

pdbId = structure.pdbId

filterStruc = Structure(name=name, conformation=conf, pdbId=pdbId)

The main body of the function is to loop through all of the chains, residues and atoms

of the input selecting only those we wish to duplicate. Thus first we go through each

Chain object and, if we have specified a filtering list for its code (chainCodes), we exclude

any that are not mentioned; the loop, and hence chain, is skipped by using the continue

command. If a chain is not excluded then we initialise a list that will contain residues to

copy:

for chain in structure.chains:

if chainCodes and (chain.code not in chainCodes):

continue

includeResidues = []

For each included chain we loop through its Residue objects and perform a similar

check to see if the residue should be included. If the residueIds argument was filled but the

residue number is not present then that residue is skipped. Otherwise, we go on to collect a

list of atoms.

for residue in chain.residues:

if residueIds and (residue.seqId not in residueIds):

continue

includeAtoms = []

Again, in the same sort of way we check to see if each atom’s name is in our list of

things to include, and if successful the list of template Atom objects is expanded.

for atom in residue.atoms:

if atomNames and (atom.name not in atomNames):

continue

includeAtoms.append(atom)

If we have notionally decided to include a particular residue but that residue does not

contain any of the required atom types, then there is no need to copy this residue at all.

When there are some atoms to copy for this residue, i.e. includeAtoms is not empty, both

the list of atoms and the Residue object are placed in the includeResidues list. We could

have placed the atoms in a big list on their own, but it is convenient to keep them with the

corresponding residue, given that we need to specify the Residue (parent object) when

making an Atom (child object).

if includeAtoms:

includeResidues.append( (residue, includeAtoms) )

If the residue list is not empty, we can make a new chain in the new Structure object,

which is passed in at Chain creation to specify the parent link. With the chain now made

we loop through the list of residues and corresponding atoms to make new Residue and

Atom objects in the new structure. Notice that we use the attributes of the original objects

when making the new ones. Thus, the residue copies will have the same number and code,

and the new atoms will have the same names and coordinates (albeit in a new array). Also,

remember when making these objects within our structure we always have to specify the

parent object, going up the data model hierarchy.

if includeResidues:

filterChain = Chain(filterStruc, chain.code, chain.molType)

for residue, atoms in includeResidues:

filterResidue = Residue(filterChain, residue.seqId,

residue.code)

for atom in atoms:

coords = array(atom.coords)

Atom(filterResidue, name=atom.name, coords=coords)

Finally in the function the new Structure object, with selectively copied components, is

passed back:

return filterStruc

The function can be tested by specifying the chain, residue and atom selection. Here we

select chain ‘A’, all residues (so the filter is None) and the backbone heavy atoms [‘N’,

‘CA’, ‘C’].

chainCodes = set(['A'])

residueIds = None # No residue filter: all of them

atomNames = set(['N','CA','C']) # Heavy backbone atoms (not H)

chain_A_backbone = filterSubStructure(struc, chainCodes,

residueIds, atomNames)

We could make a dedicated, streamlined function to make a complete copy of a

structure. However, using the above filterSubStructure() function without passing any

chain, residue or atom selection results in a full copy of the input structure. Thus we could

be cheeky and do the following to pretend we had a dedicated copy function:

def copyStructure(structure):

return filterSubStructure(structure, None, None, None)

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 221 222 223 224 225 226 227 228 ... 514