Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet247/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   243   244   245   246   247   248   249   250   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

quantile normalisation and this is commonly used in DNA microarrays where consistency

can  be  an  issue.  The  process  here  is  to  make  the  distribution  of  data  values  in  the  array

match  some  other,  external  distribution.  This  other  distribution  could  be  different

microarray data or a mathematical distribution like a normal distribution (Gaussian). The

matching of distributions is achieved by replacing each real microarray data value with the

value from the reference distribution that has equal rank, so the highest value is replaced

by the highest reference value, the second highest with the second highest reference and so

on. While this may seem a little like cheating, quantile normalisation is especially useful if

you suspect that the distribution of values in the microarray has been distorted or skewed,

but at least the order of values conveys information.

The  quantile  normalisation  procedure  can  be  done  by  using  NumPy  as  we  illustrate

below. The objective is to replace items in values  by  selecting  items  with  the  equivalent

rank from refData.  Note  that  we  don’t  just  sort  replacement  values  because  we  want  the

ranks  of  these  numbers  in  the  original  data  order.  First  the  data  array  is  flattened  into  a

one-dimensional vector and the indices of the values are extracted in size order (.argsort()

does this). Hence, order represents the selection that sorts values. To take an example, if

the  flattened  data  is  [2.5,  7.1,  0.0,  5.9]  then  the  indices  order  is  [2,  0,  3,  1]  (2  is  the

position of the smallest value, 0 the position of the next smallest etc.).

def normaliseQuantile(self, refData, channel=0):

# could be to a different channel




values = self.data[channel].flatten()

order = values.argsort()

Similarly the reference refData distribution is flattened into refValues (assumed to be an

array  of  the  same  size  as  self.data)  into  a  vector.  Then  refValues  is  sorted,  putting  its

elements  into  size  (and  hence  rank)  order,  so  that  we  obtain  an  array  of  replacement

values.  The  array  of  indices  in  original  value  order  (order)  is  itself  subject  to  .argsort().

This may seem confusing but what you get is an array of the ranks of each value, and thus

a  mapping  from  the  original  values  to  the  replacement  reference  values.  For  example,  if

values is [2.5, 7.1, 0.0, 5.9] then the refSelection is [1, 3, 0, 2], where each number is the

size rank (starting at zero) of the equivalent data value. Once defined, refSelection allows

us  to  redefine  values  by  taking  the  reference  values  in  the  original  rank  order.  Finally  a

new self.data is made by arranging values into the original shape.

refValues = refData.flatten()

refValues.sort()

refSelection = order.argsort()

values = refValues[refSelection]

self.data[channel] = values.reshape((self.nRows, self.nCols))

And we can do a similar thing to quantile normalise each row separately. However, here

we can use an internal reference distribution, which is the average for all the rows. We do

not flatten the data arrays into a vector as each row is a vector and is dealt with separately.

Accordingly we determine the order of elements of increasing value in each row (orders).

The refValues is defined by sorting the values in each row and taking the average for each

column  (so  each  is  the  average  of  values  with  equivalent  rank  from  each  row).  The

self.data rows are then replaced with those of matching rank from the refValues averages.

def normaliseRowQuantile(self, channel=0):

channelData = self.data[channel]

orders = channelData.argsort(axis=1)

sortedRows = array(channelData)

sortedRows.sort(axis=1)

refValues = sortedRows.mean(axis=0) # average over columns

rows = range(self.nRows)

self.data[channel,rows,:] = refValues[orders[rows,:].argsort()]

We can test the quantile normalisation using example data loaded from an image. For

the reference we will use the data in layer 1 (green) to normalise layer 0 (red).

imgFile = 'examples/RedGreenArray.png'

rgArray = loadArrayImage(imgFile, 'TwoChannel', 18, 17)

rgArray.normaliseQuantile(rgArray.data[1], 0)

rgArray.makeImage(25).show()




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   243   244   245   246   247   248   249   250   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish