Structural subsets
Next we will consider dissection of molecular structures into smaller parts. This sort of
thing is done in many instances. You may, for example, want to remove a flexible region
from the analysis of your molecule. Alternatively, you might want to select only a certain
kind of residue or certain kinds of atoms. The latter may be done to define the backbone
path of the molecular chain, which is useful when comparing structures with dissimilar
sequences.
The example Python function we describe makes a subset of a structure by making a
restricted copy of another structure, including only the atoms which are required.
Alternative methodologies might be to remove atoms from an existing structure, or only
load certain atoms in the first place, and these approaches may save a bit of computer
memory. Firstly we import the definitions of the classes of structural objects we wish to
make:
from Modelling import Structure, Chain, Residue, Atom
A function is then defined which takes an input structure and three other, optional,
arguments that specify which chains, residues and atoms to consider. If any of these
arguments is not specified (so defaults to None), it is taken to mean that no filtering is
done for that kind of component and all are included. The chainCodes argument is
assumed to be a collection of letter codes, e.g. [‘A’, ‘B’], the residueIds is assumed to be a
collection of residue numbers and atomNames, as you might expect, a collection of atom
names. You can use any of the common Python collection types here, list, tuple or set,
although these will be converted to sets using set() to remove repeats and give the best
speed performance.
def filterSubStructure(structure, chainCodes=None,
residueIds=None, atomNames=None):
Within the function we determine a name for the new Structure object we are going to
make by using the template Structure object’s name, and then adding ‘_filter’ plus other
strings that list which chain codes, residue numbers (converted to strings) and atom names
we have selected. Note how we first check to see if a chain, residue or atom specification
was defined (not None, and hence true) before the name is extended.
name = structure.name + '_filter'
if chainCodes:
name += ' ' + ','.join(chainCodes)
chainCodes = set(chainCodes)
if residueIds:
name += ' ' + ','.join([str(x) for x in residueIds])
residueIds = set(residueIds)
if atomNames:
name += ' ' + ','.join(atomNames)
atomNames = set(atomNames)
Next the class definition for Structure is used to make a new instance of that kind of
object, which we refer to as filterStruc. Although we defined a new name for this new
object, we keep the conformation number and PDB identifier from the original; these
indicate the origin of the data, and have not changed.
conf = structure.conformation
pdbId = structure.pdbId
filterStruc = Structure(name=name, conformation=conf, pdbId=pdbId)
The main body of the function is to loop through all of the chains, residues and atoms
of the input selecting only those we wish to duplicate. Thus first we go through each
Chain object and, if we have specified a filtering list for its code (chainCodes), we exclude
any that are not mentioned; the loop, and hence chain, is skipped by using the continue
command. If a chain is not excluded then we initialise a list that will contain residues to
copy:
for chain in structure.chains:
if chainCodes and (chain.code not in chainCodes):
continue
includeResidues = []
For each included chain we loop through its Residue objects and perform a similar
check to see if the residue should be included. If the residueIds argument was filled but the
residue number is not present then that residue is skipped. Otherwise, we go on to collect a
list of atoms.
for residue in chain.residues:
if residueIds and (residue.seqId not in residueIds):
continue
includeAtoms = []
Again, in the same sort of way we check to see if each atom’s name is in our list of
things to include, and if successful the list of template Atom objects is expanded.
for atom in residue.atoms:
if atomNames and (atom.name not in atomNames):
continue
includeAtoms.append(atom)
If we have notionally decided to include a particular residue but that residue does not
contain any of the required atom types, then there is no need to copy this residue at all.
14
When there are some atoms to copy for this residue, i.e. includeAtoms is not empty, both
the list of atoms and the Residue object are placed in the includeResidues list. We could
have placed the atoms in a big list on their own, but it is convenient to keep them with the
corresponding residue, given that we need to specify the Residue (parent object) when
making an Atom (child object).
if includeAtoms:
includeResidues.append( (residue, includeAtoms) )
If the residue list is not empty, we can make a new chain in the new Structure object,
which is passed in at Chain creation to specify the parent link. With the chain now made
we loop through the list of residues and corresponding atoms to make new Residue and
Atom objects in the new structure. Notice that we use the attributes of the original objects
when making the new ones. Thus, the residue copies will have the same number and code,
and the new atoms will have the same names and coordinates (albeit in a new array). Also,
remember when making these objects within our structure we always have to specify the
parent object, going up the data model hierarchy.
if includeResidues:
filterChain = Chain(filterStruc, chain.code, chain.molType)
for residue, atoms in includeResidues:
filterResidue = Residue(filterChain, residue.seqId,
residue.code)
for atom in atoms:
coords = array(atom.coords)
Atom(filterResidue, name=atom.name, coords=coords)
Finally in the function the new Structure object, with selectively copied components, is
passed back:
return filterStruc
The function can be tested by specifying the chain, residue and atom selection. Here we
select chain ‘A’, all residues (so the filter is None) and the backbone heavy atoms [‘N’,
‘CA’, ‘C’].
chainCodes = set(['A'])
residueIds = None # No residue filter: all of them
atomNames = set(['N','CA','C']) # Heavy backbone atoms (not H)
chain_A_backbone = filterSubStructure(struc, chainCodes,
residueIds, atomNames)
We could make a dedicated, streamlined function to make a complete copy of a
structure. However, using the above filterSubStructure() function without passing any
chain, residue or atom selection results in a full copy of the input structure. Thus we could
be cheeky and do the following to pretend we had a dedicated copy function:
def copyStructure(structure):
return filterSubStructure(structure, None, None, None)
Do'stlaringiz bilan baham: |