Cython implementation
The Cython language allows C-like constructs, including data type information, to be
introduced into what otherwise looks like Python code. This allows for much faster
execution at the expense of more complicated coding, but compared to regular C using
Cython greatly simplifies the conversion between the C and Python worlds. Cython code
is conventionally placed in a file ending with .pyx, and the Cython compiler then converts
that into a compiled C library which can then be imported in Python code, the same way
as with directly compiled C code. The compilation is normally done via a Python setup.py
file that controls a regular C compiler (like cc or gcc). For details see the Cython
documentation (see links at
http://www.cambridge.org/pythonforbiology
).
Pure Python code is valid Cython, and you can usually get a modest speed-up just
running ordinary Python code through the Cython compiler, although this will naturally
restrict portability because the code will only run on the processor architecture for which it
was compiled. The Cython implementation of the self-organising map could be made very
similar to the C implementation. As in the Python+C implementation, you could convert
from Python objects to their C equivalent, then do the calculation, then convert the result
back from C objects to Python objects. In particular, a list in Python would get converted
to a C pointer and require memory allocation and freeing. However, we will avoid the
complication of data conversion and memory allocation and freeing by instead using
NumPy data throughout the code. This makes the code simple to write and avoids a
tedious aspect of writing code in C.
We will not explain all the Cython constructs, but the main difference in syntax with
ordinary Python is that C data types are specified using cdef. Though Python objects can
also have a type specified, and in general even that makes for faster execution. In addition,
imports of C libraries are done using cimport instead of import. Here we need the NumPy
C ndarray data type. We also need random and exp from the standard NumPy. Finally, we
need the import of cython itself.
from numpy cimport ndarray
from numpy import random, exp
import cython
The main function in the Cython implementation is different from other functions we
have seen in that it has some additional Cython-specific typing for some of the arguments.
In particular, we want the inputs and spread arguments to be two-dimensional NumPy
arrays. They both have type double, which is the default NumPy type for floating point
(real number) data.
def selfOrganisingMap(ndarray[double, ndim=2] inputs,
ndarray[double, ndim=2] spread,
size, nsteps):
We then define the incidental variables we use in the code to all be C types rather than
Python types. This includes the map itself, which will be a three-dimensional NumPy
array. It is important to use C data types to make Cython code fast, although some
variables are less important than others for speed.
cdef int nrows, ncols, ninputs, depth, width, height
cdef int step, i, j
cdef double decay
cdef ndarray[double, ndim=3] somap
We then initialise the variables. The somap variable can be initialised using the standard
NumPy random number generation functionality.
nrows, ncols = size
ninputs = len(inputs)
depth = len(inputs[0])
width = len(spread)
height = len(spread[0])
somap = random.random((nrows, ncols, depth))
We then iterate over nsteps in the same way as previously.
for step in range(nsteps):
decay = exp(-step / float(nsteps))
for i in range(ninputs):
updateSomap(somap, inputs[i], spread, nrows, ncols, depth,
width, height, decay)
Finally, we return somap.
return somap
The important work is done in the updateSomap() function. In order to make Cython as
quick as possible, we turn off some checking that would otherwise be done on arrays,
using function decorators from the cython module (with the ‘@’ syntax; see
Chapter 5
).
This is dangerous to do and should only be done if you are very confident that your code
has no array-indexing bugs in it. Python handles such errors graciously, but C does not.
@cython.boundscheck(False)
@cython.nonecheck(False)
@cython.wraparound(False)
cdef void updateSomap(ndarray[double, ndim=3] somap,
ndarray[double, ndim=1] inputs,
ndarray[double, ndim=2] spread,
int nrows, int ncols, int depth,
int width, int height, double decay):
We then define the variables we will use.
cdef int halfWidth, halfHeight, i, j, k, l, m, imin, jmin
cdef double alpha, diff, diff2, diff2min
The rest of the code is very similar to the C code. It is important to use the bracket
notation [i, j, k] for array access rather than [i][j][k] because otherwise sub-arrays are
constructed, which significantly reduces the speed. It is also important that these indices
be C integers rather than Python integers.
halfWidth = (width-1) // 2
halfHeight = (height-1) // 2
imin = jmin = -1
diff2min = 0.0
for i in range(nrows):
for j in range(ncols):
diff2 = 0.0
for k in range(depth):
diff = somap[i,j,k] - inputs[k]
diff2 += diff * diff
if ((imin == -1) or (diff2 < diff2min)):
imin = i
jmin = j
diff2min = diff2
for k in range(width):
i = (imin + k - halfWidth + nrows) % nrows
for l in range(height):
j = (jmin + l - halfHeight + ncols) % ncols
alpha = decay * spread[k,l]
for m in range(depth):
somap[i,j,m] = (1.0-alpha) * somap[i,j,m] + alpha * inputs[m]
We could use the following NumPy code, which is similar to code we used in the
NumPy implementation, to replace the calculation of imin and jmin:
cdef ndarray[double, ndim=3] diff
cdef ndarray[double, ndim=2] diff2
diff = somap - inputs[i]
diff2 = (diff*diff).sum(axis=2)
cdef int index = dist2.argmin()
imin = index // nrows
jmin = index % nrows
However, it turns out that this is much slower.
Before the above functions can be used, the code must be transformed into binary code
that can actually be run. This is a two-step process in Cython. Initially the Python-like
code is converted into the equivalent C and then the C code is compiled into binary code
that can be executed on a particular type of computer system.
7
These conversions are
fairly easy to do using the Cython system. In essence, we run a setup script with Python,
which from an operating-system prompt would look something like:
>python setup.py build_ext --inplace
The ‘setup.py’ script instructs Cython which Python-like .pyx files to convert into fast,
compiled Python modules, and what the names of those modules should be. We give full
details of the setup script and how compilation is practically achieved on different
computer systems at
http://www.cambridge.org/pythonforbiology
. Our example uses
‘somapCython.pyx’ to create the ‘somapCython’ module (the actual compiled file is
‘somapCython.so’), which can then be imported into standard Python code:
from somapCython import selfOrganisingMap
And, if we set appropriate values for the input arguments, the function can be called
from Python in the normal manner:
mapOut = selfOrganisingMap(inData, spreadArray, mapSize, numSteps)
Do'stlaringiz bilan baham: |