can be an issue. The process here is to make the distribution of data values in the array
match some other, external distribution. This other distribution could be different
microarray data or a mathematical distribution like a normal distribution (Gaussian). The
matching of distributions is achieved by replacing each real microarray data value with the
value from the reference distribution that has equal rank, so the highest value is replaced
by the highest reference value, the second highest with the second highest reference and so
on. While this may seem a little like cheating, quantile normalisation is especially useful if
you suspect that the distribution of values in the microarray has been distorted or skewed,
but at least the order of values conveys information.
The quantile normalisation procedure can be done by using NumPy as we illustrate
below. The objective is to replace items in values by selecting items with the equivalent
rank from refData. Note that we don’t just sort replacement values because we want the
ranks of these numbers in the original data order. First the data array is flattened into a
one-dimensional vector and the indices of the values are extracted in size order (.argsort()
does this). Hence, order represents the selection that sorts values. To take an example, if
the flattened data is [2.5, 7.1, 0.0, 5.9] then the indices order is [2, 0, 3, 1] (2 is the
position of the smallest value, 0 the position of the next smallest etc.).
values = self.data[channel].flatten()
order = values.argsort()
Similarly the reference refData distribution is flattened into refValues (assumed to be an
array of the same size as self.data) into a vector. Then refValues is sorted, putting its
elements into size (and hence rank) order, so that we obtain an array of replacement
values. The array of indices in original value order (order) is itself subject to .argsort().
This may seem confusing but what you get is an array of the ranks of each value, and thus
a mapping from the original values to the replacement reference values. For example, if
values is [2.5, 7.1, 0.0, 5.9] then the refSelection is [1, 3, 0, 2], where each number is the
size rank (starting at zero) of the equivalent data value. Once defined, refSelection allows
us to redefine values by taking the reference values in the original rank order. Finally a
new self.data is made by arranging values into the original shape.
refValues = refData.flatten()
refValues.sort()
refSelection = order.argsort()
values = refValues[refSelection]
self.data[channel] = values.reshape((self.nRows, self.nCols))
And we can do a similar thing to quantile normalise each row separately. However, here
we can use an internal reference distribution, which is the average for all the rows. We do
not flatten the data arrays into a vector as each row is a vector and is dealt with separately.
Accordingly we determine the order of elements of increasing value in each row (orders).
The refValues is defined by sorting the values in each row and taking the average for each
column (so each is the average of values with equivalent rank from each row). The
self.data rows are then replaced with those of matching rank from the refValues averages.
def normaliseRowQuantile(self, channel=0):
channelData = self.data[channel]
orders = channelData.argsort(axis=1)
sortedRows = array(channelData)
sortedRows.sort(axis=1)
refValues = sortedRows.mean(axis=0) # average over columns
rows = range(self.nRows)
self.data[channel,rows,:] = refValues[orders[rows,:].argsort()]
We can test the quantile normalisation using example data loaded from an image. For
the reference we will use the data in layer 1 (green) to normalise layer 0 (red).
imgFile = 'examples/RedGreenArray.png'
rgArray = loadArrayImage(imgFile, 'TwoChannel', 18, 17)
rgArray.normaliseQuantile(rgArray.data[1], 0)
rgArray.makeImage(25).show()
Do'stlaringiz bilan baham: