Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet443/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   439   440   441   442   443   444   445   446   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Cython implementation

The  Cython  language  allows  C-like  constructs,  including  data  type  information,  to  be

introduced  into  what  otherwise  looks  like  Python  code.  This  allows  for  much  faster

execution  at  the  expense  of  more  complicated  coding,  but  compared  to  regular  C  using

Cython greatly simplifies the conversion between the C and Python worlds. Cython code

is conventionally placed in a file ending with .pyx, and the Cython compiler then converts

that into a compiled C library which can then be imported in Python code, the same way

as with directly compiled C code. The compilation is normally done via a Python setup.py

file  that  controls  a  regular  C  compiler  (like  cc  or  gcc).  For  details  see  the  Cython

documentation (see links at

http://www.cambridge.org/pythonforbiology

).

Pure  Python  code  is  valid  Cython,  and  you  can  usually  get  a  modest  speed-up  just



running  ordinary  Python  code  through  the  Cython  compiler,  although  this  will  naturally

restrict portability because the code will only run on the processor architecture for which it

was compiled. The Cython implementation of the self-organising map could be made very

similar to the C implementation. As in the Python+C implementation, you could convert

from Python objects to their C equivalent, then do the calculation, then convert the result

back from C objects to Python objects. In particular, a list in Python would get converted

to  a  C  pointer  and  require  memory  allocation  and  freeing.  However,  we  will  avoid  the

complication  of  data  conversion  and  memory  allocation  and  freeing  by  instead  using

NumPy  data  throughout  the  code.  This  makes  the  code  simple  to  write  and  avoids  a

tedious aspect of writing code in C.

We  will  not  explain  all  the  Cython  constructs,  but  the  main  difference  in  syntax  with

ordinary Python is that C data types are specified using cdef. Though Python objects can

also have a type specified, and in general even that makes for faster execution. In addition,



imports of C libraries are done using cimport instead of import. Here we need the NumPy

C ndarray data type. We also need random and exp from the standard NumPy. Finally, we

need the import of cython itself.

from numpy cimport ndarray

from numpy import random, exp

import cython

The  main  function  in  the  Cython  implementation  is  different  from  other  functions  we

have seen in that it has some additional Cython-specific typing for some of the arguments.

In  particular,  we  want  the  inputs  and  spread  arguments  to  be  two-dimensional  NumPy

arrays.  They  both  have  type  double,  which  is  the  default  NumPy  type  for  floating  point

(real number) data.

def selfOrganisingMap(ndarray[double, ndim=2] inputs,

ndarray[double, ndim=2] spread,

size, nsteps):

We then define the incidental variables we use in the code to all be C types rather than

Python  types.  This  includes  the  map  itself,  which  will  be  a  three-dimensional  NumPy

array.  It  is  important  to  use  C  data  types  to  make  Cython  code  fast,  although  some

variables are less important than others for speed.

cdef int nrows, ncols, ninputs, depth, width, height

cdef int step, i, j

cdef double decay

cdef ndarray[double, ndim=3] somap

We then initialise the variables. The somap variable can be initialised using the standard

NumPy random number generation functionality.

nrows, ncols = size

ninputs = len(inputs)

depth = len(inputs[0])

width = len(spread)

height = len(spread[0])

somap = random.random((nrows, ncols, depth))

We then iterate over nsteps in the same way as previously.

for step in range(nsteps):

decay = exp(-step / float(nsteps))

for i in range(ninputs):

updateSomap(somap, inputs[i], spread, nrows, ncols, depth,

width, height, decay)

Finally, we return somap.

return somap

The important work is done in the updateSomap() function. In order to make Cython as

quick  as  possible,  we  turn  off  some  checking  that  would  otherwise  be  done  on  arrays,

using  function  decorators  from  the  cython  module  (with  the  ‘@’  syntax;  see

Chapter  5

).



This is dangerous to do and should only be done if you are very confident that your code

has no array-indexing bugs in it. Python handles such errors graciously, but C does not.

@cython.boundscheck(False)

@cython.nonecheck(False)

@cython.wraparound(False)

cdef void updateSomap(ndarray[double, ndim=3] somap,

ndarray[double, ndim=1] inputs,

ndarray[double, ndim=2] spread,

int nrows, int ncols, int depth,

int width, int height, double decay):

We then define the variables we will use.

cdef int halfWidth, halfHeight, i, j, k, l, m, imin, jmin

cdef double alpha, diff, diff2, diff2min

The  rest  of  the  code  is  very  similar  to  the  C  code.  It  is  important  to  use  the  bracket

notation  [i,  j,  k]  for  array  access  rather  than  [i][j][k]  because  otherwise  sub-arrays  are

constructed, which significantly reduces the speed. It is also important that these indices

be C integers rather than Python integers.

halfWidth = (width-1) // 2

halfHeight = (height-1) // 2

imin = jmin = -1

diff2min = 0.0

for i in range(nrows):

for j in range(ncols):

diff2 = 0.0

for k in range(depth):

diff = somap[i,j,k] - inputs[k]

diff2 += diff * diff

if ((imin == -1) or (diff2 < diff2min)):

imin = i

jmin = j


diff2min = diff2

for k in range(width):

i = (imin + k - halfWidth + nrows) % nrows

for l in range(height):

j = (jmin + l - halfHeight + ncols) % ncols

alpha = decay * spread[k,l]

for m in range(depth):

somap[i,j,m] = (1.0-alpha) * somap[i,j,m] + alpha * inputs[m]

We  could  use  the  following  NumPy  code,  which  is  similar  to  code  we  used  in  the

NumPy implementation, to replace the calculation of imin and jmin:

cdef ndarray[double, ndim=3] diff

cdef ndarray[double, ndim=2] diff2

diff = somap - inputs[i]

diff2 = (diff*diff).sum(axis=2)

cdef int index = dist2.argmin()



imin = index // nrows

jmin = index % nrows

However, it turns out that this is much slower.

Before the above functions can be used, the code must be transformed into binary code

that  can  actually  be  run.  This  is  a  two-step  process  in  Cython.  Initially  the  Python-like

code is converted into the equivalent C and then the C code is compiled into binary code

that  can  be  executed  on  a  particular  type  of  computer  system.

7

 These  conversions  are



fairly easy to do using the Cython system. In essence, we run a setup script with Python,

which from an operating-system prompt would look something like:

>python setup.py build_ext --inplace

The ‘setup.py’ script instructs Cython which Python-like .pyx files to convert into fast,

compiled Python modules, and what the names of those modules should be. We give full

details  of  the  setup  script  and  how  compilation  is  practically  achieved  on  different

computer  systems  at

http://www.cambridge.org/pythonforbiology

.  Our  example  uses

‘somapCython.pyx’  to  create  the  ‘somapCython’  module  (the  actual  compiled  file  is

‘somapCython.so’), which can then be imported into standard Python code:

from somapCython import selfOrganisingMap

And,  if  we  set  appropriate  values  for  the  input  arguments,  the  function  can  be  called

from Python in the normal manner:

mapOut = selfOrganisingMap(inData, spreadArray, mapSize, numSteps)


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   439   440   441   442   443   444   445   446   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish