Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	443/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 439 440 441 442 443 444 445 446 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Cython implementation

The Cython language allows C-like constructs, including data type information, to be

introduced into what otherwise looks like Python code. This allows for much faster

execution at the expense of more complicated coding, but compared to regular C using

Cython greatly simplifies the conversion between the C and Python worlds. Cython code

is conventionally placed in a file ending with .pyx, and the Cython compiler then converts

that into a compiled C library which can then be imported in Python code, the same way

as with directly compiled C code. The compilation is normally done via a Python setup.py

file that controls a regular C compiler (like cc or gcc). For details see the Cython

documentation (see links at

http://www.cambridge.org/pythonforbiology

Pure Python code is valid Cython, and you can usually get a modest speed-up just

running ordinary Python code through the Cython compiler, although this will naturally

restrict portability because the code will only run on the processor architecture for which it

was compiled. The Cython implementation of the self-organising map could be made very

similar to the C implementation. As in the Python+C implementation, you could convert

from Python objects to their C equivalent, then do the calculation, then convert the result

back from C objects to Python objects. In particular, a list in Python would get converted

to a C pointer and require memory allocation and freeing. However, we will avoid the

complication of data conversion and memory allocation and freeing by instead using

NumPy data throughout the code. This makes the code simple to write and avoids a

tedious aspect of writing code in C.

We will not explain all the Cython constructs, but the main difference in syntax with

ordinary Python is that C data types are specified using cdef. Though Python objects can

also have a type specified, and in general even that makes for faster execution. In addition,

imports of C libraries are done using cimport instead of import. Here we need the NumPy

C ndarray data type. We also need random and exp from the standard NumPy. Finally, we

need the import of cython itself.

from numpy cimport ndarray

from numpy import random, exp

import cython

The main function in the Cython implementation is different from other functions we

have seen in that it has some additional Cython-specific typing for some of the arguments.

In particular, we want the inputs and spread arguments to be two-dimensional NumPy

arrays. They both have type double, which is the default NumPy type for floating point

(real number) data.

def selfOrganisingMap(ndarray[double, ndim=2] inputs,

ndarray[double, ndim=2] spread,

size, nsteps):

We then define the incidental variables we use in the code to all be C types rather than

Python types. This includes the map itself, which will be a three-dimensional NumPy

array. It is important to use C data types to make Cython code fast, although some

variables are less important than others for speed.

cdef int nrows, ncols, ninputs, depth, width, height

cdef int step, i, j

cdef double decay

cdef ndarray[double, ndim=3] somap

We then initialise the variables. The somap variable can be initialised using the standard

NumPy random number generation functionality.

nrows, ncols = size

ninputs = len(inputs)

depth = len(inputs[0])

width = len(spread)

height = len(spread[0])

somap = random.random((nrows, ncols, depth))

We then iterate over nsteps in the same way as previously.

for step in range(nsteps):

decay = exp(-step / float(nsteps))

for i in range(ninputs):

updateSomap(somap, inputs[i], spread, nrows, ncols, depth,

width, height, decay)

Finally, we return somap.

return somap

The important work is done in the updateSomap() function. In order to make Cython as

quick as possible, we turn off some checking that would otherwise be done on arrays,

using function decorators from the cython module (with the ‘@’ syntax; see

Chapter 5

This is dangerous to do and should only be done if you are very confident that your code

has no array-indexing bugs in it. Python handles such errors graciously, but C does not.

@cython.boundscheck(False)

@cython.nonecheck(False)

@cython.wraparound(False)

cdef void updateSomap(ndarray[double, ndim=3] somap,

ndarray[double, ndim=1] inputs,

ndarray[double, ndim=2] spread,

int nrows, int ncols, int depth,

int width, int height, double decay):

We then define the variables we will use.

cdef int halfWidth, halfHeight, i, j, k, l, m, imin, jmin

cdef double alpha, diff, diff2, diff2min

The rest of the code is very similar to the C code. It is important to use the bracket

notation [i, j, k] for array access rather than [i][j][k] because otherwise sub-arrays are

constructed, which significantly reduces the speed. It is also important that these indices

be C integers rather than Python integers.

halfWidth = (width-1) // 2

halfHeight = (height-1) // 2

imin = jmin = -1

diff2min = 0.0

for i in range(nrows):

for j in range(ncols):

diff2 = 0.0

for k in range(depth):

diff = somap[i,j,k] - inputs[k]

diff2 += diff * diff

if ((imin == -1) or (diff2 < diff2min)):

imin = i

jmin = j

diff2min = diff2

for k in range(width):

i = (imin + k - halfWidth + nrows) % nrows

for l in range(height):

j = (jmin + l - halfHeight + ncols) % ncols

alpha = decay * spread[k,l]

for m in range(depth):

somap[i,j,m] = (1.0-alpha) * somap[i,j,m] + alpha * inputs[m]

We could use the following NumPy code, which is similar to code we used in the

NumPy implementation, to replace the calculation of imin and jmin:

cdef ndarray[double, ndim=3] diff

cdef ndarray[double, ndim=2] diff2

diff = somap - inputs[i]

diff2 = (diff*diff).sum(axis=2)

cdef int index = dist2.argmin()

imin = index // nrows

jmin = index % nrows

However, it turns out that this is much slower.

Before the above functions can be used, the code must be transformed into binary code

that can actually be run. This is a two-step process in Cython. Initially the Python-like

code is converted into the equivalent C and then the C code is compiled into binary code

that can be executed on a particular type of computer system.

These conversions are

fairly easy to do using the Cython system. In essence, we run a setup script with Python,

which from an operating-system prompt would look something like:

>python setup.py build_ext --inplace

The ‘setup.py’ script instructs Cython which Python-like .pyx files to convert into fast,

compiled Python modules, and what the names of those modules should be. We give full

details of the setup script and how compilation is practically achieved on different

computer systems at

http://www.cambridge.org/pythonforbiology

. Our example uses

‘somapCython.pyx’ to create the ‘somapCython’ module (the actual compiled file is

‘somapCython.so’), which can then be imported into standard Python code:

from somapCython import selfOrganisingMap

And, if we set appropriate values for the input arguments, the function can be called

from Python in the normal manner:

mapOut = selfOrganisingMap(inData, spreadArray, mapSize, numSteps)

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 439 440 441 442 443 444 445 446 ... 514