Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet251/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   247   248   249   250   251   252   253   254   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Hierarchical clustering

The next example moves on to rearranging the data (swapping rows and columns) so that

we  can  better  see  similarities  or  correlations  in  the  elements.  This  rearrangement  will

involve hierarchical clustering, and uses the same idea as building a phylogenetic tree (see

Chapter  14

),  so  that  the  most  similar  rows  and  columns  are  placed  next  to  one  another.

The aim is that any rows or columns that show similar patterns will be placed together in

easily visualised sub-groups. The image-generating method described earlier can then be

used to make a visual representation (see

Figure 16.3

).

The  function  __hierarchicalRowCluster  takes  an  array  of  data  as  a  two-dimensional



matrix and uses this to build a matrix of the distances between each row, so the element

distanceMatrix[i][j] is the distance between row i and row j. By ‘distance’ here we mean

the  similarity  between  different  rows,  and  measure  this  here  as  the  Euclidean  distance

between row vectors (root of sum of differences squared). In different situations you could

consider  other  means  of  estimating  the  similarity  between  rows.

6

It should be noted that



the function name begins with a double underscore ‘__’. In Python a function starting, and

not ending, with a double underscore is effectively private,

7

so it can’t be called directly



from an instance of a class. A function is kept private like this if you don’t want to expose

it as a normal method (a function linked to a class), and only use it internally within the

class definition. Hence, here we make the clustering function private and use it in the inner

workings  of  Microarray  but  do  not  allow  method  calls  from  an  object  like

microrray1.__hierarchicalRowCluster(data).

def __hierarchicalRowCluster(self, dataMatrix):

from SeqVariation import neighbourJoinTree

n = len(dataMatrix[0])




distanceMatrix = zeros((n, n), float)

We loop through each layer and row in the dataMatrix and take this row away from the

whole  array,  so  we  get  a  matrix  of  differences  to  this  row.  The  differences  are  squared,

added  up  along  the  row  and  the  square  root  is  taken,  giving  a  distance  from  row  to  all

other  rows.  These  distances  are  then  placed  at  position  i  for  this  row  within  the

distanceMatrix.

for channelData in dataMatrix:

for i, row in enumerate(channelData):

diffs = channelData - row

sqDiffs = diffs * diffs

sqDists = sqDiffs.sum(axis=1)

distanceMatrix[i,:] += sqDists

The  function  neighbourJoinTree  which  was  described  previously  in

Chapter  14

 is

reused here to create a hierarchical tree from the distance matrix. Note that we use .tolist()



to make a list of lists from the array of distances because the tree-generating function did

not make use of NumPy arrays earlier in the book.

tree, joinOrder = neighbourJoinTree(distanceMatrix.tolist())

The hierarchical tree structure is then interrogated by following its branches, effectively

flattening  it  into  a  single  list  of  node  indices,  which  represents  the  order  of  rows  for  the

shuffled  matrix.  The  list  rowOrder  is  initially  defined  as  a  copy  of  the  tree,  but  is  then

processed to insert the contents of any sub-lists (or tuples) directly into the main list. The

use  of  a  while  loop  here  is  because  the  size  of  the  rowOrder  list  will  grow  as  the  tree

branches are flattened. We record an index i, which represents the position in the list that

is  being  processed,  and  continue  until  the  end  of  the  list.  The  inner  while  loop  checks

whether the list item at position i is an integer (the end of a tree branch), and if it is not the

contents of the sub-list (rowOrder[i]) are inserted into the main list using the slice notation

[i:i+1], replacing the original range covering the sub-list with its contents. The use of the

second while loop is needed because the new element at position i may itself be another

sub-list.

rowOrder = list(tree)

i = 0

while i < len(rowOrder):



while not isinstance(rowOrder[i], int):

rowOrder[i:i+1] = rowOrder[i]

i += 1

return rowOrder



The row clustering routine is actually used by the hierarchicalCluster function, which is

not private and so provides the method microarray.hierarchicalCluster(). This clusters the

rows in self.data and then reorders the array according to the row hierarchy. The resulting

data  array  is  then  transposed  (flip  rows  for  columns)  and  clustered  again,  effectively

clustering the columns, which forms the new column order.



def hierarchicalCluster(self):

rows = self.__hierarchicalRowCluster(self.data)

swapped = transpose(self.data, axes=(0,2,1))

cols = self.__hierarchicalRowCluster(swapped)

The reordered rows and cols  are  used  as  indices  into  the  NumPy  arrays  to  shuffle  the

data,  according  to  the  hierarchical  clustering,  noting  that  we  don’t  affect  self.data.

8

 The


data is then used to make an entirely new Microarray object, with a different order for its

rows and columns.

data = self.data[:,rows] # Rearrange

data = data[:,:,cols]

# data = array(data.tolist()) # to fix PIL.Image bug

name = self.name + '-Sorted'

rowData = [self.rowData[i] for i in rows]

colData = [self.colData[j] for j in cols]

sortedArray = Microarray(name, data, rowData, colData)

return sortedArray

When  testing  the  hierarchical  clustering  we  get  back  a  new  Microarray  object  with  a

different order of rows and columns:

sortedArray = rgArray.hierarchicalCluster()

sortedArray.makeImage(20).show()

print(rgArray.rowData)

print(sortedArray.rowData)

# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] – Original

rows


# [4, 13, 2, 11, 6, 15, 5, 14, 7, 16, 1, 10, 3, 12, 8, 17, 0, 9] – Shuffled

rows



Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   247   248   249   250   251   252   253   254   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish