Python Programming for Biology: Bioinformatics and Beyond

Calculating root-mean-square deviation

Download 7,75 Mb.

Pdf ko'rish

bet	229/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 225 226 227 228 229 230 231 232 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Calculating root-mean-square deviation

In order to help us interpret the coordinate superimposition, and because it is required in

subsequent examples, we define a function that will calculate the variation in the

coordinates across the atom positions represented in our arrays. Strictly this is the

mathematical definition often called root-mean-square deviation or RMSD for short. As

the name suggests the RMSD value is calculated by taking the differences in coordinate

positions, squaring them, finding the average value and then the square root of this. This is

effectively the average distance of coordinate spread, although it should be noted that it is

an average of squares, not the distances themselves, and so is biased more towards the

larger deviations having more influence. (Note that the alignCoords() function can easily

be adapted to calculate RMSD for the overall transformation, but we have deliberately

avoided complicating the function further.)

The function that calculates the RMSDs takes an array of reference coordinates, a list of

the other coordinate array to compare with and a list of weights, so that each atom position

can be biased separately. Inside the function we initialise a list to hold the RMSD values

for each structure, find the total of all the input weights (using the handy, inbuilt sum()

function) and initialise an empty array of zeros that is the same size as the coordinate

arrays, which will hold the summation of positional differences.

def calcRmsds(refCoords, allCoords, weights):

rmsds = []

totalWeight = sum(weights)

totalSquares = zeros(refCoords.shape)

for coords in allCoords:

delta = coords-refCoords

squares = delta * delta

totalSquares += squares

sumSquares = weights*squares.sum(axis=1)

rmsds.append( sqrt(sum(sumSquares)/totalWeight) )

nStruct = len(allCoords)

atomRmsds = sqrt(totalSquares.sum(axis=1)/nStruct)

return rmsds, atomRmsds

The bulk of the function involves looping through the list of coordinate arrays and

comparing them to the reference. The operations in this loop all involve whole array

objects, and so when we add, subtract, multiply and divide the operations are applied to all

elements on the arrays at the same time. This is the advantage of using the NumPy arrays:

it simplifies the code and avoids having to write more loops. Accordingly delta is the array

of all coordinate differences, and the elements of this whole array are squared to give

squares. The square coordinate differences are added to the array of totals for use later.

The squared deviation for each atom is calculated as the sum of the square values along

the spatial axis (i.e. x

+ y

+ z

). Here this is done using the

squares.sum(axis=1)operation, thus we get the total for each atom separately and form

another array. This is then multiplied by the weights for the atoms to give sumSquares,

which represents the contribution of each atom to the coordinate ‘deviation’. Lastly, the

sumSquares is summed over all atoms to give a single value, which is divided by the total

weight to find the average atomic square deviation for each structure. The square root of

this (hence root-mean-square deviation) is placed in the RMSD list.

Once the loop is complete the RMSD values for the individual atoms are calculated.

Given that the square differences for the atoms were added to totalSquares for all of the

coordinate arrays (all structures) the average of these is then used to calculate each atom’s

RMSD over the whole set of conformations. As before, this is all done with NumPy

operations, to work on whole arrays without loops.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 225 226 227 228 229 230 231 232 ... 514