Python Programming for Biology: Bioinformatics and Beyond


Aligning a structure ensemble



Download 7,75 Mb.
Pdf ko'rish
bet230/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   226   227   228   229   230   231   232   233   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Aligning a structure ensemble

Given  that  we  now  have  a  mathematical  function  that  can  find  the  optimum  rotation  to

superimpose two arrays of coordinates, we now require a function that will superimpose

more than two coordinate arrays, i.e. so we can superpose all the conformations of a whole



structure  ensemble.  We  will  do  this  by  repeating  pairwise  superimpositions  relative  to

some  reference.  Repeating  the  superimposition  of  structure  pairs  is  not  actually  the

optimal  way  to  align  structures;  it  would  be  better  to  have  a  method  that  compared  all

against all at the same time. However, more complicated methods are beyond the scope of

this book, and the result we get here will be very good in most circumstances. Indeed we

try to be a little clever in the way that we perform pair-superimposition in the following

examples.  Initially,  we  arbitrarily  align  all  coordinates  with  the  first  set,  from  which  we

calculate  an  average  (albeit  not  real)  structure.  We  then  find  the  real  structure  which  is

closest  to  the  average  to  use  as  a  reference  for  a  second  round  of  superimposing

alignment;  we  assume  the  reference  is  somewhere  near  the  ‘middle’  of  the  spread  of

coordinates. Also, in the second round we change the weights, which may initially be set

according  to  atom  mass,  to  values  that  represent  how  variable  each  atom  position  was




during  the  first  round.  Thus,  the  final  coordinate  superimposition  will  be  biased  towards

the atom positions that were most similar in the first round and dissimilar regions (flexible

parts  of  an  ensemble,  for  example)  will  not  have  much  influence.  This  frees  us  from

having to specify which parts of the structures are most invariant and gives better overall

results. It should be noted that the whole superimposition procedure may be repeated more

than  once  to  get  the  coordinates  to  converge  more,  although  the  convergence  is  usually

very good with only one pass.

def superimposeCoords(allCoords, weights, threshold=5.0):

nStruct = len(allCoords)

The  reference  coordinate  array,  that  will  remain  stationary  during  the  alignment,  is

arbitrarily set to the first from the list of all arrays. We then create an array of zeros of the

same  size  to  which  we  will  subsequently  add  all  of  the  coordinates  (element  by  element

inside  the  array)  after  they  are  aligned  to  the  reference,  so  that  we  can  find  the  average.

The list of rotation matrices, which we pass back at the end of the function, starts empty

and will be filled as we do the alignments.

refCoords = allCoords[0]

meanCoords = zeros(refCoords.shape)

rotations = []

The  first  round  of  coordinate  superimposition  (alignment)  is  achieved  by  looping

through all the coordinate arrays and aligning them to the reference set. Note that because

the  reference  array  is  simply  the  first  array  in  the  list  we  don’t  need  to  align  the  first

coordinates  (with  themselves).  Accordingly,  when  the  loop  index  (generated  by

enumerate())  is  zero  we  can  skip  the  alignment  and  set  the  rotation  matrix  as  the  3×3

identity matrix, identity(3) (representing no rotation). Otherwise, the coordinate alignment

is  done  with  the  appropriate  function  and  the  optimised  rotation  matrix  and  updated

coordinates are obtained. These new coordinates are put back into the allCoords array at

the  current  index,  replacing  the  previous  coordinates  (the  alignment  makes  a  new

coordinate array and doesn’t affect the originals). Then the rotation matrix is collected into

the  rotations  list.  Finally  in  the  loop,  the  coordinate  array  is  added  to  the  total  held  in

meanCoords,  which  is  divided  by  the  number  of  structures,  after  the  loop,  to  give  the

average positions.

for index, coords in enumerate(allCoords):

if index == 0:

rotation = identity(3)

else:

rotation, coords = alignCoords(refCoords, coords, weights)



allCoords[index] = coords # Update to aligned

rotations.append(rotation)

meanCoords += coords

meanCoords /= nStruct

We  calculate  the  RMSD  values  for  the  structures  and  atoms  using  the  above  defined

function. Note that these values will be relative to the coordinate average (passed in as the




first  argument)  and  will  respect  the  weights.  From  the  resulting  RMSDs  it  is  a  simple

matter  to  find  the  best  (smallest)  value.  The  index  of  this  value  in  the  rmsds  list  then

allows the selection of the appropriate coordinate array within the list of all coordinates.

This best set of coordinates will then become the reference structure for a second round of

superimposition,  with  adjusted  weights.  Effectively  this  reference  array  is  the  structure

closest to the mean of the aligned coordinates.

rmsds, atomRmsds = calcRmsds(meanCoords, allCoords, weights)

bestRmsd = min(rmsds)

bestIndex = rmsds.index(bestRmsd) # Closest to mean

bestCoords = allCoords[bestIndex]

Adjusted weights are defined for a second round of coordinate alignment. The weights

that are used are derived from the RMSD values for the corresponding atoms. We scale the

RMDS  value  by  a  threshold  value,  which  effectively  defines  the  sensitivity  to  structural

variation. Then the new weights are calculated as the negative exponent of the scale values

squared. In this way atoms with small RMSD values will have the largest weights and as

the  variance  increases  the  weighting  will  diminish  exponentially.  Note  that  the  below  is

done  on  NumPy  array  objects  (and  the  exp()  function  is  the  numpy  version)  so  that  the

operations are performed on all of the elements in the array at once.

weightScale = atomRmsds/threshold

weights *= exp(-weightScale*weightScale)

We  define  an  array,  which  will  contain  the  average  coordinates,  by  making  a  copy  of

the  coordinate  array  that  was  used  as  the  superimposition  reference:  the  set  that  was

closest to the mean. A loop is then used to go through all of the coordinate arrays, noting

that if the index matches our reference coordinates (indexBest) then we skip that loop; we

don’t have to add the coordinates to the average as they are already in the required array,

and  because  it  is  the  reference  there  is  no  need  to  do  the  superimposition.  For  the  other

coordinate  arrays,  the  superimposing  alignment  is  performed,  optimising  the  rotation  to

the reference. Each new rotation matrix is replaced in the list of rotations and naturally the

rotation matrix for the reference is left as it was before. Updated coordinates are inserted

back  into  allCoords  at  the  appropriate  index  and  then  added,  element-wise,  to  the  array

used to calculate the coordinate average.

meanCoords = bestCoords.copy()

for index, coords in enumerate(allCoords):

if index != bestIndex:

rotation, coords = alignCoords(bestCoords, coords, weights)

rotations[index] = rotation

allCoords[index] = coords # Update to aligned

meanCoords += coords

meanCoords /= nStruct

The summation of the coordinate arrays is then divided by the number of structures to

get  the  revised  average  (mean)  coordinate  set.  Finally,  a  separate  function  is  used  to

calculate the RMSD values of the structures and individual atoms. Note that these values




compare  the  new  coordinates  with  the  average  set  of  coordinates

16

 and  also  that  we  use



redefined, unbiased weights (set to 1.0 courtesy of NumPy’s handy ones() function) so that

we report the observed distance variation for each atom equally. Then the final coordinate

array, the RMSDs and the list of rotation matrices are passed back at the return statement.

weights = ones(len(weights))

rmsds, atomRmsds = calcRmsds(meanCoords, allCoords, weights)

return allCoords, rmsds, atomRmsds, rotations

Given  that  we  now  have  a  function  that  can  perform  an  optimised  superimposition  of

coordinate  arrays,  another  function  is  required  that  will  apply  the  procedure  to  Structure

objects. Accordingly, we define a function that takes a list of such objects, from which the

coordinate  arrays  can  be  extracted,  perform  the  coordinate  superimposition,  and  then

update the original Atom objects with the new coordinates. This function requires a way

of  getting  the  weightings  of  different  types  of  atoms,  so  we  use  the  previously  defined

ATOMIC_NUMS to give that information, although it is likely that in practical situations

such a dictionary will be defined in another, more general module.

def superimposeStructures(structures):

In the function the weights variable is initially set to None, although it will be set to a

list of numbers once we come across the first structure object. Then we define an empty

list  to  contain  all  of  the  coordinates  for  all  of  the  structures;  this  will  be  a  list  of  two-

dimensional arrays.

weights = None

allCoords = []

The  structures  are  inspected  in  turn  inside  a  loop,  and  for  each  we  use  the  existing

getAtomCoords() function to get the coordinate array and the corresponding list of Atom

objects. In essence, we convert from our data model to NumPy arrays. Also in the loop we

define the list of weights, if it is not already defined, by using the atomic numbers of the

atoms, which we look up in a dictionary. This of course assumes that the weights for one

structure  will  do  for  all  structures,  a  reasonable  assumption  because  we  are  only

superimposing  the  same  kinds  of  atoms.  Next  the  coordinates  are  redefined  by  moving

them to the centre (zero on all axes), using the centerCoords() function and respecting the

atom weights, which are now in an array object. Finally at the end of the loop the centred

coordinate array is put in the list containing all coordinates.

for structure in structures:

atoms, coords = getAtomCoords(structure)

if weights is None:

weights = [ATOMIC_NUMS[atom.element] for atom in atoms]

weights = array(weights)

coords, center = centerCoords(coords, weights)

allCoords.append(coords)

We then run the function that actually performs the coordinate superimposition on the



arrays of coordinates. The results we get back are: the array of modified coordinates, a list

of  RMSD  values  for  each  structure,  a  list  of  RMSD  values  for  each  atom  (across  all

structures) and a list of the rotations that were applied to each of the structures to do the

superimposition. Although the rotation data is not useful here, it is used in other contexts,

as illustrated in later examples.

results = superimposeCoords(allCoords, weights)

allCoords, rmsds, atomRmsds, rotations = results

Lastly in the function we go through each of the structures, using enumerate() to get a

list index as well as a Structure object. The getAtomCoords() function is used to get a list

of atoms and an array of the original coordinates (which we are not actually interested in).

Then a second loop is performed, going through each atom index (j) and atom object. The

coordinates  of  the  atom  are  then  updated  from  the  array  that  contains  the  new,

superimposed  locations.  Note  how  we  use  the  two  indices  to  get  the  correct  group  of

coordinates  from  the  allCoords  array,  and  that  the  group  is  immediately  unpacked  into

three variables. With the Atom objects updated with new positions, the loops and then the

function  end,  passing  back  the  RMSD  values  that  specify  how  good  the  coordinate

superimposition was for each structure, and the individual atoms.

for i, structure in enumerate(structures):

atoms, oldCoords = getAtomCoords(structure)

for j, atom in enumerate(atoms):

atom.coords = allCoords[i][j]

return rmsds, atomRmsds

We can test the function with two or more structures, as long as they represent the same

atoms.  Here  we  load  the  same  structure  from  file  twice,  into  two  separate  objects,  then

rotate one of them before trying the coordinate superimposition. This is a reasonable test

because  the  structures  are  the  same,  aside  from  the  transformation,  and  thus  the

superimposition should be near perfect, with RMSD values of zero.

from Modelling import getStructuresFromFile

strucA = getStructuresFromFile('examples/1A12.pdb')[0]

strucB = getStructuresFromFile('examples/1A12.pdb')[0]

rotateStructureNumPy(strucA, (1,0,0), pi/3)

rmsds, atomRmsds = superimposeStructures([strucA, strucB])

print(rmsds) # Hopefully all close to zero

If  we  get  a  whole  ensemble  of  structural  models,  for  example  by  downloading  data

from an NMR-derived PDB entry, then we can test the alignment on the corresponding list

of structures.

fileName = downloadPDB('1UST')

strucObjs = getStructuresFromFile(fileName)

coords, rmsds, atomRmsds, rotation = superimposeStructures(strucObjs)



print(rmsds)


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   226   227   228   229   230   231   232   233   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish