Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	295/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 291 292 293 294 295 296 297 298 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

A ‘

Peak

’ class

One way to store information about peaks would be to use simple data structures, like lists

or arrays, which would contain values corresponding to positions, intensities, linewidths

etc. However, here we will represent peaks using a custom Python object by constructing a

class called Peak. This will link together all of the parameters that relate to the same peak

and allow us to associate special functions for the class (i.e. methods) to calculate various

peak properties. See

Chapter 7

for more detailed discussion about how classes are written,

and the role of self. and the constructor function __init__.

In the construction of the Peak class we will insist that a peak, at the very least, has a

reference to the data it derives from (self.data) and a position (self.position). The position

might be a whole number, where the peak lies at a point on the grid, or it may lie between

grid points. The latter indicates that some kind of interpolation has been done, to specify a

more accurate position. So that we can store both on-grid and interpolated values

self.point records the nearest grid point to self.position.

Keeping a reference to the underlying (often frequency) data means that we can later

interpolate and fit the peak. Hence we include dataHeight, the height of the peak, and the

linewidth. These are optional parameters that may be calculated if not specified in the

initial instance (if they are None).

from numpy import ix_, outer, zeros

class Peak:

def __init__(self, position, data, dataHeight=None,

linewidth=None):

self.position = tuple(position)

self.data = data

The on-grid points for the peaks are initially set up by rounding the input position to the

nearest integer, remembering that we have a value for each dimension.

self.point = tuple([int(round(x)) for x in position])

If the dataHeight is not provided then it is calculated as being the value of the

underlying data at self.point. A better alternative would be to do some kind of simple

quadratic fitting.

if dataHeight is None:

dataHeight = data[self.point]

self.dataHeight = dataHeight

If the linewidth is not initially specified then it is calculated using the

_calcHalfHeightWidth() function, which we describe below.

if linewidth is None:

linewidth = self._calcHalfHeightWidth()

self.linewidth = linewidth

Finally in the constructor we initialise the fitted parameters as null values.

self.fitAmplitude = None

self.fitPosition = None

self.fitLinewidth = None

The linewidth is calculated by the function below as the full width at half the peak

height, noting that there is a different width in each dimension. The method to do this goes

through each dimension in the data and finds the positions on the two sides of the peak

that would be at half the maximum height (see

Figure 19.4

), using _findHalfPoints(). The

width is simply the difference between these, which is added to the list that is passed back.

def _calcHalfHeightWidth(self):

dimWidths = []

for dim in range(self.data.ndim):

posA, posB = self._findHalfPoints(dim)

width = posB - posA

dimWidths.append(width)

return dimWidths

The function _findHalfPoints looks in a given dimension for the places either side of

the peak position where the half-height is reached. The first issue to face is that it is

possible to hit the edge of the grid, where the data is defined, before the half-height is

reached. We have taken a simple approach of just stopping at the edge of the grid, but an

alternative would be to wrap round and continue on the other side of the grid. However, if

you took the latter approach you would need to make sure the code does not end up in an

indefinite loop, going around and around, because the values are possibly all above the

half-height.

The initial height is the absolute value of self.dataHeight, and halfHt is half that, i.e. the

value we are looking for. Using the absolute value simplifies the code a little because it

means we won’t need to use separate clauses if the peaks go in the negative direction. We

define variables to refer to the underlying data (data) and peak grid position (point) by

using self. to access attributes that belong to the Peak class. We are not forced to define

new variables here, but as a general practice we minimise use of self. for speed reasons.

def _findHalfPoints(self, dim):

height = abs(self.dataHeight)

halfHt = 0.5 * height

data = self.data

point = self.point

The search will be done by investigating the height values associated with a testPoint

that varies in position along the tested dimension. Initially this is a copy of the peak’s grid

point so that the positions are correct in the other (unsearched) dimensions. Note that

testPoint is a list because it can be changed internally (unlike a tuple) as we assign

different positions in one of the dimensions. The variables posA and posB are the one-

dimensional search positions, corresponding to the two sides of the peak, which will be

refined and passed back at the end.

testPoint = list(point)

posA = posB = point[dim]

The actual search involves a while loop that tests whether the search position is still

within bounds. The first search is backwards, i.e. towards zero, using posA. The value of

posA is decreased by one to get the previous point in the data and this is set within the

correct dimensional index of testPoint, so that we can access the absolute value of the

(possibly multi-dimensional) data. Note NumPy requires that we convert testPoint to a

tuple for use as an index of data.

prevValue = height

while posA > 0: # Search backwards

posA -= 1

testPoint[dim] = posA

value = abs(data[tuple(testPoint)])

At the end of the loop there is a check to see if the half-height has been reached,

breaking out of the loop if it has. Linear interpolation is almost always needed, because

the half-height is unlikely to occur exactly on a grid point. Hence, if the data value is less

than the half-height we know we’ve over-stepped, and we add back a fraction of a point.

This fraction is calculated as the drop in height of value below halfHt as a proportion of

the drop in height from one point to the next.

if value <= halfHt:

posA += (halfHt-value)/(prevValue-value)

break

prevValue = value

And we repeat the procedure for posB, searching in the other direction, though this time

the last data point in the data is recorded from the .shape of the data array (its size) so that

we know when to stop. Then at the end we pass back the two positions that correspond to

the half-height.

lastPoint = data.shape[dim] - 1 # Shape is size

prevValue = height

while posB < lastPoint-1: # Search forwards

posB += 1

testPoint[dim] = posB

value = abs(data[tuple(testPoint)])

if value <= halfHt:

posB -= (halfHt-value)/(prevValue-value)

break

prevValue = value

return posA, posB

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 291 292 293 294 295 296 297 298 ... 514