Simple geometric manipulations
With some structure data in hand the job now is to get used to working with its various
components. Given that the data we are working with consists of three-dimensional
coordinates, many of the manipulations involve thinking about geometry. Accordingly, the
first example is designed to find the centre of mass (centroid) of a structure. Although
similar functionality has been illustrated earlier, in
Chapter 5
, this time we are using the
objects in our structure data model. This is done so that the loading and saving of data are
entirely separate from the operations we want to focus on, thus making things clearer and
more general.
Firstly, a dictionary is defined to give the atomic numbers of the chemical elements
commonly found in biological molecules. If precise results were really important to you,
the dictionary would contain many more elements, and list atomic masses rather than the
atomic number (number of protons); however, this dictionary is sufficient for most
purposes; the most important thing is that hydrogen has less weight. There is also an
import for the NumPy function zeros, so the centre of mass can be initialised as a
numpy.array object. Importing from the numpy module will simplify our code by allowing
us to perform operations, quickly and with a minimal number of commands, on entire
arrays of numbers at once (see
Chapter 9
). The numpy module is often not part of the
standard Python installation, but its use is exceedingly widespread and it is available to
download via the Scientific Python website (see
http://www.scipy.org
).
The function getCenterOfMass is defined, accepting a Structure object (as we describe
in
Chapter 8
) to work on as an argument. Inside the function the variables to represent the
central position and the total mass are set to zero. Then we have three nested loops, which
iterate over all of the chains, residues and atoms, all according to the structure of our data
model objects, thus accessing each of the atoms in the structure in turn.
from numpy import zeros
ATOMIC_NUMS = {'H':1, 'C':12, 'N':14, 'O':16, 'P':31, 'S':32}
def getCenterOfMass(structure):
centerOfMass = zeros(3, float)
totalMass = 0.0
for chain in structure.chains:
for residue in chain.residues:
for atom in residue.atoms:
mass = ATOMIC_NUMS.get(atom.element, 12.0)
centerOfMass += mass * atom.coords
totalMass += mass
centerOfMass /= totalMass
return centerOfMass
The nested loops get to the individual atoms, whereupon we use the symbol for the
atom’s chemical element (atom.element) as key in the dictionary of atomic numbers to
extract an approximate relative mass for the atom. Note that if the key is not present, we
get a mass equivalent to carbon. The mass is then multiplied with the atomic coordinates,
in an element-wise fashion given that centerOfMass and atom.coords are both NumPy
arrays, and also added to the totals. In this way heavier atoms will contribute more to the
total compared to lighter ones, e.g. carbon will contribute 12 times as much as hydrogen.
The mass is also added to a separate total, to record the combined mass of all atoms. Once
the loops are finished the array for the coordinate totals is then divided by the total mass to
get the per-atom average. Of course if we were not using a weighted summation, we could
divide by the number of atoms.
We can test the function on data loaded as a Structure object:
struc = getStructuresFromFile('examples/1A12.pdb')[0]
print(getCenterOfMass(struc))
In the next examples we will actually change structural coordinates. Usually we don’t
want to mess about with the coordinates too much; however, it is very common to move
(translate) and rotate a structure, to reposition it while still leaving the relative position of
the coordinates unaffected. The Python example for rotating a structure involves using
what is often called the rotation matrix. For three-dimensional space a rotation matrix is a
three by three square array of numbers which represent the transformation of Cartesian (x,
y and z) coordinates. In essence, to use the matrix we take a vector containing a positional
coordinate and multiply it (matrix style) to get a new vector, which is now rotated relative
to the origin (the zero point on all axes).
The function that generates the rotation matrix is the one described in
Chapter 9
, and
can be imported from the corresponding example module and is used inside the function
below, which takes a given structure and rotates it by a given angle about a given axis.
Naturally the molecular structure, axis vector and angle are passed in as arguments,
although we have defaults set to rotation about the x axis (1,0,0) and zero angle. The
getRotationMatrix function operates on the axis and angle to produce the matrix, and we
convert it to an array so that we can use the dot() function, which calculates the dot
product between arrays; and if we use it on matrices this is the same thing as matrix
multiplication. With the rotation matrix defined the task is then to loop through all of the
chains, residues and atoms, to get each atom in turn so that we can rotate its coordinates.
from numpy import array, dot
from Maths import getRotationMatrix
from math import pi
def rotateStructure(structure, axis=(1,0,0), angle=0):
rMatrix = array(getRotationMatrix(axis, angle))
for chain in structure.chains:
for residue in chain.residues:
for atom in residue.atoms:
newCoords = dot(rMatrix, atom.coords)
atom.coords = newCoords
rotateStructure(struc, (0,1,0), pi/2)
For each atom we calculate the dot product of the rotation matrix and the original
coordinate array; this is performing the matrix multiplication operation. Once the new,
rotated, coordinates are fully defined, the attribute of the atom object is updated
accordingly (remembering that it is stored as an array). Note that because we are
manipulating the structure internally we don’t need to return anything from the function.
Given that we passed the structure into the function in the first place it will be available
after the function is run, albeit now with rotated atom coordinates.
Do'stlaringiz bilan baham: |