Python Programming for Biology: Bioinformatics and Beyond

Representing structures as NumPy arrays

Download 7,75 Mb.

Pdf ko'rish

bet	223/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 219 220 221 222 223 224 225 226 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Representing structures as NumPy arrays

The examples above used standard Python loops and didn’t make full use of the NumPy

array objects that store the atom coordinates. Accordingly, given that each atom position is

described as a single NumPy array (a vector of x, y and z coordinates) each atomic

coordinate can be placed inside another larger array, to form a two-dimensional matrix

which represents the whole structure. To be able to work with whole 2D arrays of

coordinates we must first transfer each atom’s coordinates from the data model into the 2D

array. Once defined, the 2D array data can be manipulated without having to repeatedly

loop through each atom in turn; most operations can work on all atoms at the same time.

Internally there will actually be looping going on, but we do not have to think about that

and can trust that it will occur in an efficient and expected manner.

The following function loops through all of the atoms in a structure and places the

coordinates in a NumPy array object. This function is thus general and can be used any

time we wish to move from the hierarchical molecular data model to a numeric array

system. Firstly, we import the required functions from the numpy module: array which

will convert a Python list or tuple and dot for matrix multiplication.

from numpy import array, dot

def getAtomCoords(structure):

coords = []

atoms = []

for chain in structure.chains:

for residue in chain.residues:

for atom in residue.atoms:

coords.append(atom.coords)

atoms.append(atom)

return atoms, array(coords)

Inside the function we simply loop through the chains, residue and atoms and append

the atom coordinates to a larger list of coordinates. We also collect a list of the atom

objects at the same time, so that we will know which coordinate goes with which atom;

obviously useful if we want to update the atom positions after changing the coordinates.

Finally, we return the list of atoms and the coordinates, taking special notice of the fact

that the coordinates are converted from the normal Python list into a NumPy array object

(which here will be two-dimensional).

We will now use this array-generating function within another function that rotates an

input structure. This does exactly the same job as the rotation function described earlier,

but the innards are much simpler to write because we use one 2D array for all the

coordinates and avoid writing loops. As before, the function takes an input angle and axis

to define a rotation matrix which is converted to an array. The getAtomCoords() function

just described is used to get the array object that represents all of the coordinates.

def rotateStructureNumPy(structure, axis=array([1,0,0]), angle=0):

rMatrix = array(getRotationMatrix(axis, angle))

atoms, coords = getAtomCoords(structure)

coords = dot(coords, rMatrix.T)

for index, atom in enumerate(atoms):

atom.coords = list(coords[index])

The actual rotation operation is exceedingly simple: we just use the NumPy dot()

operation to multiply the coordinate array by the rotation matrix;

this can be described as

applying a coordinate transformation. Finally, the updated coordinates are used to update

the coordinates of the atom objects. Note that we loop through the atom objects with the

enumerate() function, to get not only the atom but also an index position. This index is

then used to identify the correct location in the coordinate array, given that the order of the

atom list exactly matches the order of coordinates; the getAtomCoords()function

guarantees this. The position for each atom is updated in one operation by assigning one

coordinate array (containing three numbers) to the three atom attributes, i.e. the array is

unpacked into separate components.

Here we test the function by rotating the previously loaded structure’s coordinates π/2

radians (90°), about the y axis.

Because the coordinates are modified internally on the

atom objects, there is no value to catch when the function is called. Note that the numeric

constant pi is not known to Python until we import it from the math module.

from math import pi

rotateStructureNumPy(struc, (0,1,0), pi/2)

The next example function moves on from rotation to do an arbitrary transformation of

the structural coordinates. Such a change is called an affine transformation in maths-

speak, and involves a transformation matrix, which may or may not be a rotation, and a

translation: a vector to specify movement from one location to another. The transformation

matrix may be used to stretch, reflect, shrink, enlarge and rotate the structure.

Firstly, we import the identity function from the NumPy module, which is used to

create an identity matrix, where the diagonal elements are one and the other elements are

zero. The identity matrix represents no transformation at all when used in multiplication,

just as normal multiplication by 1 doesn’t change a number. We use an identity matrix of

size three

as the default value of the transform variable, so that unless we state otherwise

no matrix transformation occurs. The default for the translation is a vector of zeros; so no

movement occurs.

from numpy import identity

def affineTransformStructure(structure, transform=identity(3),

translate=(0,0,0)):

atoms, coords = getAtomCoords(structure)

coords = dot(coords, transform.T)

coords = coords + translate

for index, atom in enumerate(atoms):

atom.coords = coords[index]

As with the rotation example we use the getAtomCoords() to get the list of atom objects

and the corresponding array of coordinates. The dot() call is used to apply the

transformation, i.e. by matrix multiplication. The translation, to move the coordinates, is

achieved by simply adding the translate vector, containing the numbers to add to the x, y

and z locations, to the coordinate array. Although the coords variable is a two-dimensional

numpy.array, and the translate vector is only one-dimensional (just three values, one for

each axis), the array addition repeats over all of the rows in the larger array. This addition

does a simple element-wise addition for the x, y and z numbers in each atom row;

numbers at equivalent columns in the two arrays are added together to make a new array

of the same size as the coordinate array.

Below, the function is tested on a loaded structure with a transformation that creates a

mirror image (the coordinates take on the opposite sign; negative to positive, positive to

negative) and a translation which is a movement of 10 units along the x and y axes. Using

the basic PDB writer function writeStructureToFile() that is available in the on-line

material we can write the modified structure, with transformed coordinates, to a new file

so that we can view the structure in graphics software to see the result of our handiwork.

Using graphical software to view molecular structures will be discussed later in this

chapter.

mirrorTransform = array([[-1,0,0], [0,-1,0], [0,0,-1]])

translate = array([10.0,10.0,0.0])

affineTransformStructure(struc, mirrorTransform, translate)

from Structures import writeStructureToFile

writeStructureToFile(struc, 'testTransform.pdb')

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 219 220 221 222 223 224 225 226 ... 514