Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet399/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   395   396   397   398   399   400   401   402   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

A Kohonen map in Python

The Python function below encodes a simple self-organising map. It makes extensive use

of  array  functionality  imported  from  NumPy,  which  in  this  instance  reduces  calculation

time  for  large  maps  and,  perhaps  more  importantly  for  this  book,  improves  brevity  by

avoiding  loops.  Although  some  readers  may  find  the  use  of  vector/array  operations

disconcerting  to  start  with,  the  approach  does  focus  attention  on  the  higher-level

operations and reasoning that underpin the method.

Firstly,  we  import  a  couple  of  modules  to  give  access  to  array  operations  and  the

exponential  function.  Then  we  define  the  function  that  takes  a  list  of  inputs  (feature

vectors  defined  as  numpy.array  objects)  and  something  we  have  called  spread,  which

represents  the  strength  of  connectivity  between  close  nodes  on  the  grid;  this  is  a  square

array (a matrix) of weights where the centre represents the amount of pull from the input

data  on  the  best-matching  grid  node  during  learning  and  the  off-centre  values  determine

how much this influence spreads to adjacent grid positions. The size input is the number

of  rows  and  columns  in  the  grid  map  and  steps  is  simply  the  number  of  learning  cycles

that will be performed.

import numpy

from math import exp

def selfOrganisingMap(inputs, spread, size, steps=1000):

We extract the number of rows and columns for the grid from the input size and then

determine vecLen, which represents the size of the feature vectors (e.g. three for a colour

containing  red,  green  and  blue  values).  These  numbers  are  then  used  to  make  the  initial

version  of  the  map  grid,  which  is  named  somap.  This  grid  naturally  has  the  required

number of rows and columns, but also has a depth axis so that each grid location contains

a  number  of  feature  values  (a  feature  vector),  hence  the  input  axis  sizes  for  the  map  are

nRows, nCols, vecLen.

nRows, nCols = size

vecLen = len(inputs[0])

somap = numpy.random.rand(nRows, nCols, vecLen)

The  next  thing  to  initialise  is  an  array  that  determines  the  influence  (pull)  of  an  input

data  vector  on  a  region  of  the  map  grid.  The  input  spread  array  gives  the  basic  form  of

this;  however,  this  needs  to  be  applied  to  all  of  the  feature  values  in  the  map,  however

deep  the  feature  vectors;  all  the  values  along  the  vector  are  moved  closer  to  the

corresponding values from the best-match input. The spread array is only one value deep,

but  the  map  is  vecLen  deep,  hence  we  define  the  influence  array,  which  uses  the  values

from spread, repeated vecLen times. Building a deep array of this kind upfront means we

do not have to loop through the features to apply the influence weighting. The influence

array is simple to create; by using the ‘*’ operation on a list the spread array is replicated,

7

and the dstack()



8

function stacks the arrays on top of one another along the depth axis. The

infWidth is calculated to find the radius of the influence, as needed later in the calculation.

Then to improve clarity and speed we define the function makeMesh, which is simply a




renaming of the cryptic numpy.ix_ function; this takes lists of row and column indices and

creates  a  ‘mesh’  where  the  indices  intersect,  and  hence  allows  the  extraction  of  a  sub-

matrix from a larger matrix.

influence = numpy.dstack([spread]*vecLen) # One for each feature

infWidth = (len(spread)-1) // 2

makeMesh = numpy.ix_ # Ugly

With  the  initialisation  done  the  code  now  goes  through  the  main  learning  steps.  In

Python 2, the xrange() function is used instead of range() because the number of steps may

be  fairly  large  and  the  latter  would  make  a  large  list  in  memory,  not  just  yield  a  step

number, which is all we need. In Python 3, the range() function behaves like xrange() in

Python 2. For each step the decay variable is calculated as the exponent of the proportion

we have gone through all steps; this is then used to diminish the pull of the input data on

the map vectors so that initially they can change a large amount but later stabilise. Then

for each step we begin a loop through the array of input data vectors.

for s in range(steps): # xrange in Python 2

decay = exp(-s/float(steps))

for vector in inputs:

Inside  the  loop,  for  each  input,  we  calculate  the  difference  of  the  current  map  to  the

vector. This is an array operation in compact NumPy notation where same vector is taken

away from each of the feature vectors across all of the rows and columns of the grid; the

result is an array the same size as the grid, with a difference vector for each grid position.

This  difference  array  is  then  squared  in  an  element-wise  manner  (not  matrix

multiplication),  then  the  square  differences  are  added  up  along  the  depth  (feature)  axis.

The result dist2 is effectively a distance (squared) for each point on the grid to the current

input, to give a measure of similarity.

diff = somap-vector

diff2 = diff*diff

dist2 = diff2.sum(axis=2)

The  argmin()  function  quickly  finds  the  index  in  the  array  representing  the  minimum

vector distance (most similar grid position); unfortunately, however, the index given back

is the position in the array as if it were flattened into one long list. Thus, we then have to

perform more calculations to determine what the column and the row of this index are; the

row is found using integer division (the number of complete rows covered by the index),

and the column is a remainder of the index from the start of the row.

index = dist2.argmin()

row = index // nRows

col = index % nRows

Given the row and column of the best-matching feature vector in the grid, we determine

lists of rows and columns which cover the area of the map that will be ‘pulled’ towards the

input. These lists are a range of values that goes infWidth positions either side of the best-

matching  row  or  column.  Note  the  second  argument  for  range()  has  one  added  to  it,



because  the  second  argument  is  a  limit  that  is  not  included  on  the  output.  The  row  and

column numbers (x and y)  are  subject  to  a  modulo  ‘%’ operation

9

 so  that  if  values  from



the  range  fall  off  the  edge  of  the  grid  (less  than  zero  or  greater  than  last  position)  the

calculated remainder continues the row/column on the opposite side. Effectively this joins

the edges of the grid so that it wraps round in a continuous way; it needn’t be done like

this, but edge effects are helpfully removed.

rows = [x % nRows for x in range(row-infWidth, row+1+infWidth)]

cols = [y % nCols for y in range(col-infWidth, col+1+infWidth)]

Finally  in  the  loop,  the  rows  and  columns  are  used  to  make  a  mesh,  where  they

intersect,  that  is  then  used  to  extract  sub-matrices  for  the  grid  map  and  difference  array.

The  sub-matrix  part  of  the  map  is  the  region  that  will  be  influenced  by  the  input  vector.

The  sub-matrix  of  the  map  is  moved  towards  the  input  feature  vector  by  subtracting  the

corresponding  sub-matrix  of  the  difference  array  (reducing  the  difference  between  map

and  input).  Note  that  the  sub-matrix  of  difference  values  (diff[mesh])  is  scaled  by  two

things: the influence of the input vector on the grid points near to the best-match position

and the scaling factor that decays during the course of the learning. The influence array is

the  same  size  and  shape  as  the  sub-matrix,  but  decay  is  a  simple  number.  The  scaling

means the amount of adjustment of the map grid towards the input vector diminishes both

with  the  in-map  distance  from  the  best-match  point  and  over  the  course  of  repeating  the

procedure.  Finally,  after  the  loop  the  grid  array  somap  (of  mapped  feature  vectors)  is

passed back.

mesh = makeMesh(rows,cols)

somap[mesh] -= diff[mesh] * influence * decay

return somap

To test the function we first define a numpy.array which will determine how far across

the  self-organising  map  (grid)  the  input  influence  spreads.  The  centre  (scoring  1.0  here)

represents the best-match position in the map.

spread = numpy.array([[0.0, 0.10, 0.2, 0.10, 0.0],

[0.1, 0.35, 0.5, 0.35, 0.1],

[0.2, 0.50, 1.0, 0.50, 0.2],

[0.1, 0.35, 0.5, 0.35, 0.1],

[0.0, 0.10, 0.2, 0.10, 0.0]])

As a test we will make some random input; an array 400 (20×20) vectors of length 3.

Each feature vector will represent a different colour in terms of red, green and blue values

between zero and one.

rows, cols = 20, 20

testInput = numpy.random.rand(rows * cols, 3)

The  self-organising  map  is  run  for  100  iterations,  and  will  arrange  the  values  on  a

20×20 grid. Note that the input is of length rows × cols so that we have exactly one data

point (input colour vector) for each map position. This needn’t be the case, but is good to

use for this example because it allows a 1-to-1 mapping of inputs to grid positions; thus



the  operation  effectively  performs  shuffling  of  the  inputs  to  make  a  2D  map  of  colour

similarity.

som = selfOrganisingMap(testInput, spread, (rows, cols), 100)

We can then view the results of the organised map (look at the feature vectors for each

node in the grid) by converting the numpy.array into a real colour image that we can view

on  screen.  The  conversion  of  an  array  of  numbers  to  a  displayable,  graphical  image  is

discussed in

Chapter 18

.

from PIL import Image



colors = som*255

colors = colors.astype(numpy.uint8)

img1 = Image.fromarray(colors, 'RGB')

img1.save('som.png', 'PNG')

img1.show()


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   395   396   397   398   399   400   401   402   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish