Python Programming for Biology: Bioinformatics and Beyond



Download 7,75 Mb.
Pdf ko'rish
bet295/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   291   292   293   294   295   296   297   298   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

A ‘

Peak

’ class

One way to store information about peaks would be to use simple data structures, like lists

or  arrays,  which  would  contain  values  corresponding  to  positions,  intensities,  linewidths

etc. However, here we will represent peaks using a custom Python object by constructing a

class called Peak. This will link together all of the parameters that relate to the same peak

and allow us to associate special functions for the class (i.e. methods) to calculate various

peak properties. See

Chapter 7

for more detailed discussion about how classes are written,

and the role of self. and the constructor function __init__.

In the construction of the Peak class we will insist that a peak, at the very least, has a

reference to the data it derives from (self.data) and a position (self.position). The position

might be a whole number, where the peak lies at a point on the grid, or it may lie between

grid points. The latter indicates that some kind of interpolation has been done, to specify a

more  accurate  position.  So  that  we  can  store  both  on-grid  and  interpolated  values

self.point records the nearest grid point to self.position.




Keeping  a  reference  to  the  underlying  (often  frequency)  data  means  that  we  can  later

interpolate and fit the peak. Hence we include dataHeight, the height of the peak, and the

linewidth.  These  are  optional  parameters  that  may  be  calculated  if  not  specified  in  the

initial instance (if they are None).

from numpy import ix_, outer, zeros

class Peak:

def __init__(self, position, data, dataHeight=None,

linewidth=None):

self.position = tuple(position)

self.data = data

The on-grid points for the peaks are initially set up by rounding the input position to the

nearest integer, remembering that we have a value for each dimension.

self.point = tuple([int(round(x)) for x in position])

If  the  dataHeight  is  not  provided  then  it  is  calculated  as  being  the  value  of  the

underlying  data  at  self.point.  A  better  alternative  would  be  to  do  some  kind  of  simple

quadratic fitting.

if dataHeight is None:

dataHeight = data[self.point]

self.dataHeight = dataHeight

If  the  linewidth  is  not  initially  specified  then  it  is  calculated  using  the

_calcHalfHeightWidth() function, which we describe below.

if linewidth is None:

linewidth = self._calcHalfHeightWidth()

self.linewidth = linewidth

Finally in the constructor we initialise the fitted parameters as null values.

self.fitAmplitude = None

self.fitPosition = None

self.fitLinewidth = None

The  linewidth  is  calculated  by  the  function  below  as  the  full  width  at  half  the  peak

height, noting that there is a different width in each dimension. The method to do this goes

through  each  dimension  in  the  data  and  finds  the  positions  on  the  two  sides  of  the  peak

that would be at half the maximum height (see

Figure 19.4

), using _findHalfPoints(). The

width is simply the difference between these, which is added to the list that is passed back.

def _calcHalfHeightWidth(self):

dimWidths = []

for dim in range(self.data.ndim):




posA, posB = self._findHalfPoints(dim)

width = posB - posA

dimWidths.append(width)

return dimWidths

The  function  _findHalfPoints  looks  in  a  given  dimension  for  the  places  either  side  of

the  peak  position  where  the  half-height  is  reached.  The  first  issue  to  face  is  that  it  is

possible  to  hit  the  edge  of  the  grid,  where  the  data  is  defined,  before  the  half-height  is

reached. We have taken a simple approach of just stopping at the edge of the grid, but an

alternative would be to wrap round and continue on the other side of the grid. However, if

you took the latter approach you would need to make sure the code does not end up in an

indefinite  loop,  going  around  and  around,  because  the  values  are  possibly  all  above  the

half-height.

The initial height is the absolute value of self.dataHeight, and halfHt is half that, i.e. the

value  we  are  looking  for.  Using  the  absolute  value  simplifies  the  code  a  little  because  it

means we won’t need to use separate clauses if the peaks go in the negative direction. We

define  variables  to  refer  to  the  underlying  data  (data)  and  peak  grid  position  (point)  by

using self. to access attributes that belong to the Peak  class.  We  are  not  forced  to  define

new variables here, but as a general practice we minimise use of self. for speed reasons.

def _findHalfPoints(self, dim):

height = abs(self.dataHeight)

halfHt = 0.5 * height

data = self.data

point = self.point

The search will be done by investigating the height values associated with a testPoint

that varies in position along the tested dimension. Initially this is a copy of the peak’s grid

point  so  that  the  positions  are  correct  in  the  other  (unsearched)  dimensions.  Note  that

testPoint  is  a  list  because  it  can  be  changed  internally  (unlike  a  tuple)  as  we  assign

different  positions  in  one  of  the  dimensions.  The  variables  posA  and  posB  are  the  one-

dimensional  search  positions,  corresponding  to  the  two  sides  of  the  peak,  which  will  be

refined and passed back at the end.

testPoint = list(point)

posA = posB = point[dim]

The  actual  search  involves  a  while  loop  that  tests  whether  the  search  position  is  still

within bounds. The first search is backwards, i.e. towards zero, using posA. The value of

posA  is  decreased  by  one  to  get  the  previous  point  in  the  data  and  this  is  set  within  the

correct  dimensional  index  of  testPoint,  so  that  we  can  access  the  absolute  value  of  the

(possibly  multi-dimensional)  data.  Note  NumPy  requires  that  we  convert  testPoint  to  a

tuple for use as an index of data.

prevValue = height

while posA > 0: # Search backwards

posA -= 1

testPoint[dim] = posA

value = abs(data[tuple(testPoint)])



At  the  end  of  the  loop  there  is  a  check  to  see  if  the  half-height  has  been  reached,

breaking  out  of  the  loop  if  it  has.  Linear  interpolation  is  almost  always  needed,  because

the half-height is unlikely to occur exactly on a grid point. Hence, if the data value is less

than the half-height we know we’ve over-stepped, and we add back a fraction of a point.

This fraction is calculated as the drop in height of value below halfHt as a proportion of

the drop in height from one point to the next.

if value <= halfHt:

posA += (halfHt-value)/(prevValue-value)

break

prevValue = value



And we repeat the procedure for posB, searching in the other direction, though this time

the last data point in the data is recorded from the .shape of the data array (its size) so that

we know when to stop. Then at the end we pass back the two positions that correspond to

the half-height.

lastPoint = data.shape[dim] - 1 # Shape is size

prevValue = height

while posB < lastPoint-1: # Search forwards

posB += 1

testPoint[dim] = posB

value = abs(data[tuple(testPoint)])

if value <= halfHt:

posB -= (halfHt-value)/(prevValue-value)

break

prevValue = value



return posA, posB


Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   291   292   293   294   295   296   297   298   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish