Calculating root-mean-square deviation
In order to help us interpret the coordinate superimposition, and because it is required in
subsequent examples, we define a function that will calculate the variation in the
coordinates across the atom positions represented in our arrays. Strictly this is the
mathematical definition often called root-mean-square deviation or RMSD for short. As
the name suggests the RMSD value is calculated by taking the differences in coordinate
positions, squaring them, finding the average value and then the square root of this. This is
effectively the average distance of coordinate spread, although it should be noted that it is
an average of squares, not the distances themselves, and so is biased more towards the
larger deviations having more influence. (Note that the alignCoords() function can easily
be adapted to calculate RMSD for the overall transformation, but we have deliberately
avoided complicating the function further.)
The function that calculates the RMSDs takes an array of reference coordinates, a list of
the other coordinate array to compare with and a list of weights, so that each atom position
can be biased separately. Inside the function we initialise a list to hold the RMSD values
for each structure, find the total of all the input weights (using the handy, inbuilt sum()
function) and initialise an empty array of zeros that is the same size as the coordinate
arrays, which will hold the summation of positional differences.
def calcRmsds(refCoords, allCoords, weights):
rmsds = []
totalWeight = sum(weights)
totalSquares = zeros(refCoords.shape)
for coords in allCoords:
delta = coords-refCoords
squares = delta * delta
totalSquares += squares
sumSquares = weights*squares.sum(axis=1)
rmsds.append( sqrt(sum(sumSquares)/totalWeight) )
nStruct = len(allCoords)
atomRmsds = sqrt(totalSquares.sum(axis=1)/nStruct)
return rmsds, atomRmsds
The bulk of the function involves looping through the list of coordinate arrays and
comparing them to the reference. The operations in this loop all involve whole array
objects, and so when we add, subtract, multiply and divide the operations are applied to all
elements on the arrays at the same time. This is the advantage of using the NumPy arrays:
it simplifies the code and avoids having to write more loops. Accordingly delta is the array
of all coordinate differences, and the elements of this whole array are squared to give
squares. The square coordinate differences are added to the array of totals for use later.
The squared deviation for each atom is calculated as the sum of the square values along
the spatial axis (i.e. x
2
+ y
2
+ z
2
). Here this is done using the
squares.sum(axis=1)operation, thus we get the total for each atom separately and form
another array. This is then multiplied by the weights for the atoms to give sumSquares,
which represents the contribution of each atom to the coordinate ‘deviation’. Lastly, the
sumSquares is summed over all atoms to give a single value, which is divided by the total
weight to find the average atomic square deviation for each structure. The square root of
this (hence root-mean-square deviation) is placed in the RMSD list.
Once the loop is complete the RMSD values for the individual atoms are calculated.
Given that the square differences for the atoms were added to totalSquares for all of the
coordinate arrays (all structures) the average of these is then used to calculate each atom’s
RMSD over the whole set of conformations. As before, this is all done with NumPy
operations, to work on whole arrays without loops.
Do'stlaringiz bilan baham: |