Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet296/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   292   293   294   295   296   297   298   299   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Peak picking

Now  that  we  have  defined  the  Peak  class,  we  are  ready  to  actually  identify  or  pick  the

peaks  from  the  frequency  data.  There  are  many  subtle  issues  here,  and  we  will  only

provide  the  simplest  peak  picker,  with  some  commentary  about  what  can  be  done  to

improve  it.  How  best  to  proceed  will  depend  on  the  specifics  of  the  type  of  data  being

analysed,  and  it  is  unlikely  that  any  peak  picker  will  work  in  all  circumstances  for  all

possible data sets.

Here  we  will  just  look  for  local  maxima  above  a  specified  threshold.  This  works

reasonably  well  for  data  that  is  not  especially  noisy  or  crowded.  For  this  kind  of  simple

operation  there  is  existing  functionality  from  SciPy  and  NumPy  that  can  be  used  to  do

most of the work, although the way they are used here is not necessarily speed-efficient.

The  maximum_filter()function  will  be  used  to  find  maxima  in  the  data.  Although  this  is

part of SciPy’s multi-dimensional image-processing module the data need not actually be

an image. It is notable that the minimum_filter() function also exists, which could be used

to find negative peaks.

from scipy.ndimage.filters import maximum_filter




from numpy import argwhere

The  function  findPeaks  takes  an  array  of  data  (of  arbitrary  dimensionality)  and  a

threshold  value.  Only  maxima  above  this  threshold  will  be  considered  as  peaks,  and  in

general  the  threshold  will  be  set  to  distinguish  the  required  signals  from  the  noise.  The

size represents the region to consider when searching for maxima. The default value of 3

means to consider a central point (potentially the peak centre) and its nearest neighbours

either side, which is reasonable if the data is not too noisy. The mode argument specifies

how  the  data  is  treated  at  the  boundary.  When  mode=‘wrap’  this  means  that  the  data  is

wrapped around at the boundaries, and this is often the most appropriate value for Fourier

transformed data, which is naturally periodic. Sometimes, though, the data is truncated, in

which  case  it  is  more  appropriate  to  use  mode=‘constant’,  which  in  effect  treats  the

boundary  as  the  end  of  the  data.  (In  the  previous  section  we  only  had  code  for  the

equivalent of mode=‘constant’.)

def findPeaks(data, threshold, size=3, mode='wrap'):

peaks = []

if (data.size == 0) or (data.max() < threshold):

return peaks

For  the  NumPy  array  data  it  is  possible  to  filter  on  a  threshold  just  by  doing  data  >

threshold.  This  creates  an  array,  with  the  same  size  and  dimensions  as  data,  where  each

entry is True or False, depending on whether the corresponding value in data is above the

threshold.

boolsVal = data > threshold

Next  we  use  maximum_filter()  to  find  local  maxima  in  the  data,  considering  the

specified  size  to  inspect.  This  function  returns  a  copy  of  the  input  array  where  the  non-

maximal  points  have  been  set  to  zero.

9

 Then  we  create  another  array  of  True  and  False



values,  corresponding  to  whether  the  filtered  points  have  their  original  values  (data  ==

maxFilter), i.e. we will have a True at each maximum.

maxFilter = maximum_filter(data, size=size, mode=mode)

boolsMax = data == maxFilter

The  peak  points  that  are  both  above  the  threshold  and  local  maxima  are  found  as  the

intersection (logical AND operation) between values from the two arrays of Booleans.

boolsPeak = boolsVal & boolsMax

Next we find the indices of the True values in the boolsPeak array, which correspond to

the positions (row, column etc.) of the peaks in data. The NumPy function we use to get

the indices is argwhere(), which gives an array of the non-zero (true) positions in the form

((row1, col1, …), (row2, col2, …) …), i.e. with one group of coordinates for each peak. It

is notable that nonzero() is often used to find true values in an array, but this would give a

result in the form ((row1, row2, …), (col1, col2, …) …), which is ideal for indexing array

elements but inconvenient here.




indices = argwhere(boolsPeak) # Position indices for True

Finally Peak objects are created for these grid positions and appended to the peaks list

that is passed back.

for position in indices:

position = tuple(position)

height = data[position]

peak = Peak(position, data, height)

peaks.append(peak)

return peaks

There  are  improvements  that  could  be  made  to  the  algorithm.  Instead  of  looking  in  a

rectangular  region  for  local  maxima,  a  non-rectangular  footprint  could  be  specified.

Sometimes  it  is  also  worth  checking  that  any  putative  peak  has  a  shape  such  that  a

sufficient  drop  is  made  between  the  maximum  height  and  the  minimum  value  before  it

turns up again, in any direction. Also, if the data is such that negative height peaks are also

of interest, then the algorithm can be modified to look for minima (below zero) by using

minimum_filter() when the threshold is negative.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   292   293   294   295   296   297   298   299   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish