Aligning a structure ensemble
Given that we now have a mathematical function that can find the optimum rotation to
superimpose two arrays of coordinates, we now require a function that will superimpose
more than two coordinate arrays, i.e. so we can superpose all the conformations of a whole
structure ensemble. We will do this by repeating pairwise superimpositions relative to
some reference. Repeating the superimposition of structure pairs is not actually the
optimal way to align structures; it would be better to have a method that compared all
against all at the same time. However, more complicated methods are beyond the scope of
this book, and the result we get here will be very good in most circumstances. Indeed we
try to be a little clever in the way that we perform pair-superimposition in the following
examples. Initially, we arbitrarily align all coordinates with the first set, from which we
calculate an average (albeit not real) structure. We then find the real structure which is
closest to the average to use as a reference for a second round of superimposing
alignment; we assume the reference is somewhere near the ‘middle’ of the spread of
coordinates. Also, in the second round we change the weights, which may initially be set
according to atom mass, to values that represent how variable each atom position was
during the first round. Thus, the final coordinate superimposition will be biased towards
the atom positions that were most similar in the first round and dissimilar regions (flexible
parts of an ensemble, for example) will not have much influence. This frees us from
having to specify which parts of the structures are most invariant and gives better overall
results. It should be noted that the whole superimposition procedure may be repeated more
than once to get the coordinates to converge more, although the convergence is usually
very good with only one pass.
def superimposeCoords(allCoords, weights, threshold=5.0):
nStruct = len(allCoords)
The reference coordinate array, that will remain stationary during the alignment, is
arbitrarily set to the first from the list of all arrays. We then create an array of zeros of the
same size to which we will subsequently add all of the coordinates (element by element
inside the array) after they are aligned to the reference, so that we can find the average.
The list of rotation matrices, which we pass back at the end of the function, starts empty
and will be filled as we do the alignments.
refCoords = allCoords[0]
meanCoords = zeros(refCoords.shape)
rotations = []
The first round of coordinate superimposition (alignment) is achieved by looping
through all the coordinate arrays and aligning them to the reference set. Note that because
the reference array is simply the first array in the list we don’t need to align the first
coordinates (with themselves). Accordingly, when the loop index (generated by
enumerate()) is zero we can skip the alignment and set the rotation matrix as the 3×3
identity matrix, identity(3) (representing no rotation). Otherwise, the coordinate alignment
is done with the appropriate function and the optimised rotation matrix and updated
coordinates are obtained. These new coordinates are put back into the allCoords array at
the current index, replacing the previous coordinates (the alignment makes a new
coordinate array and doesn’t affect the originals). Then the rotation matrix is collected into
the rotations list. Finally in the loop, the coordinate array is added to the total held in
meanCoords, which is divided by the number of structures, after the loop, to give the
average positions.
for index, coords in enumerate(allCoords):
if index == 0:
rotation = identity(3)
else:
rotation, coords = alignCoords(refCoords, coords, weights)
allCoords[index] = coords # Update to aligned
rotations.append(rotation)
meanCoords += coords
meanCoords /= nStruct
We calculate the RMSD values for the structures and atoms using the above defined
function. Note that these values will be relative to the coordinate average (passed in as the
first argument) and will respect the weights. From the resulting RMSDs it is a simple
matter to find the best (smallest) value. The index of this value in the rmsds list then
allows the selection of the appropriate coordinate array within the list of all coordinates.
This best set of coordinates will then become the reference structure for a second round of
superimposition, with adjusted weights. Effectively this reference array is the structure
closest to the mean of the aligned coordinates.
rmsds, atomRmsds = calcRmsds(meanCoords, allCoords, weights)
bestRmsd = min(rmsds)
bestIndex = rmsds.index(bestRmsd) # Closest to mean
bestCoords = allCoords[bestIndex]
Adjusted weights are defined for a second round of coordinate alignment. The weights
that are used are derived from the RMSD values for the corresponding atoms. We scale the
RMDS value by a threshold value, which effectively defines the sensitivity to structural
variation. Then the new weights are calculated as the negative exponent of the scale values
squared. In this way atoms with small RMSD values will have the largest weights and as
the variance increases the weighting will diminish exponentially. Note that the below is
done on NumPy array objects (and the exp() function is the numpy version) so that the
operations are performed on all of the elements in the array at once.
weightScale = atomRmsds/threshold
weights *= exp(-weightScale*weightScale)
We define an array, which will contain the average coordinates, by making a copy of
the coordinate array that was used as the superimposition reference: the set that was
closest to the mean. A loop is then used to go through all of the coordinate arrays, noting
that if the index matches our reference coordinates (indexBest) then we skip that loop; we
don’t have to add the coordinates to the average as they are already in the required array,
and because it is the reference there is no need to do the superimposition. For the other
coordinate arrays, the superimposing alignment is performed, optimising the rotation to
the reference. Each new rotation matrix is replaced in the list of rotations and naturally the
rotation matrix for the reference is left as it was before. Updated coordinates are inserted
back into allCoords at the appropriate index and then added, element-wise, to the array
used to calculate the coordinate average.
meanCoords = bestCoords.copy()
for index, coords in enumerate(allCoords):
if index != bestIndex:
rotation, coords = alignCoords(bestCoords, coords, weights)
rotations[index] = rotation
allCoords[index] = coords # Update to aligned
meanCoords += coords
meanCoords /= nStruct
The summation of the coordinate arrays is then divided by the number of structures to
get the revised average (mean) coordinate set. Finally, a separate function is used to
calculate the RMSD values of the structures and individual atoms. Note that these values
compare the new coordinates with the average set of coordinates
16
and also that we use
redefined, unbiased weights (set to 1.0 courtesy of NumPy’s handy ones() function) so that
we report the observed distance variation for each atom equally. Then the final coordinate
array, the RMSDs and the list of rotation matrices are passed back at the return statement.
weights = ones(len(weights))
rmsds, atomRmsds = calcRmsds(meanCoords, allCoords, weights)
return allCoords, rmsds, atomRmsds, rotations
Given that we now have a function that can perform an optimised superimposition of
coordinate arrays, another function is required that will apply the procedure to Structure
objects. Accordingly, we define a function that takes a list of such objects, from which the
coordinate arrays can be extracted, perform the coordinate superimposition, and then
update the original Atom objects with the new coordinates. This function requires a way
of getting the weightings of different types of atoms, so we use the previously defined
ATOMIC_NUMS to give that information, although it is likely that in practical situations
such a dictionary will be defined in another, more general module.
def superimposeStructures(structures):
In the function the weights variable is initially set to None, although it will be set to a
list of numbers once we come across the first structure object. Then we define an empty
list to contain all of the coordinates for all of the structures; this will be a list of two-
dimensional arrays.
weights = None
allCoords = []
The structures are inspected in turn inside a loop, and for each we use the existing
getAtomCoords() function to get the coordinate array and the corresponding list of Atom
objects. In essence, we convert from our data model to NumPy arrays. Also in the loop we
define the list of weights, if it is not already defined, by using the atomic numbers of the
atoms, which we look up in a dictionary. This of course assumes that the weights for one
structure will do for all structures, a reasonable assumption because we are only
superimposing the same kinds of atoms. Next the coordinates are redefined by moving
them to the centre (zero on all axes), using the centerCoords() function and respecting the
atom weights, which are now in an array object. Finally at the end of the loop the centred
coordinate array is put in the list containing all coordinates.
for structure in structures:
atoms, coords = getAtomCoords(structure)
if weights is None:
weights = [ATOMIC_NUMS[atom.element] for atom in atoms]
weights = array(weights)
coords, center = centerCoords(coords, weights)
allCoords.append(coords)
We then run the function that actually performs the coordinate superimposition on the
arrays of coordinates. The results we get back are: the array of modified coordinates, a list
of RMSD values for each structure, a list of RMSD values for each atom (across all
structures) and a list of the rotations that were applied to each of the structures to do the
superimposition. Although the rotation data is not useful here, it is used in other contexts,
as illustrated in later examples.
results = superimposeCoords(allCoords, weights)
allCoords, rmsds, atomRmsds, rotations = results
Lastly in the function we go through each of the structures, using enumerate() to get a
list index as well as a Structure object. The getAtomCoords() function is used to get a list
of atoms and an array of the original coordinates (which we are not actually interested in).
Then a second loop is performed, going through each atom index (j) and atom object. The
coordinates of the atom are then updated from the array that contains the new,
superimposed locations. Note how we use the two indices to get the correct group of
coordinates from the allCoords array, and that the group is immediately unpacked into
three variables. With the Atom objects updated with new positions, the loops and then the
function end, passing back the RMSD values that specify how good the coordinate
superimposition was for each structure, and the individual atoms.
for i, structure in enumerate(structures):
atoms, oldCoords = getAtomCoords(structure)
for j, atom in enumerate(atoms):
atom.coords = allCoords[i][j]
return rmsds, atomRmsds
We can test the function with two or more structures, as long as they represent the same
atoms. Here we load the same structure from file twice, into two separate objects, then
rotate one of them before trying the coordinate superimposition. This is a reasonable test
because the structures are the same, aside from the transformation, and thus the
superimposition should be near perfect, with RMSD values of zero.
from Modelling import getStructuresFromFile
strucA = getStructuresFromFile('examples/1A12.pdb')[0]
strucB = getStructuresFromFile('examples/1A12.pdb')[0]
rotateStructureNumPy(strucA, (1,0,0), pi/3)
rmsds, atomRmsds = superimposeStructures([strucA, strucB])
print(rmsds) # Hopefully all close to zero
If we get a whole ensemble of structural models, for example by downloading data
from an NMR-derived PDB entry, then we can test the alignment on the corresponding list
of structures.
fileName = downloadPDB('1UST')
strucObjs = getStructuresFromFile(fileName)
coords, rmsds, atomRmsds, rotation = superimposeStructures(strucObjs)
print(rmsds)
Do'stlaringiz bilan baham: |