Figure 16.3 (Plate 4). The results of various array comparison procedures. Shown
from left to right are: the product of red and green channels, displayed as yellow, to
illustrate the coincidence between channels; the G-test scores of red × log
2
(red/green),
which is designed to show where the values in the two channels are different and of
significant value; hierarchical clustering to produce shuffled rows and columns in the test
data, which here shows that there are replicates for each row.
As another example, next we find the logarithm (here in base 2) of the ratio of the red
and green channels. Combining two-channel red and green microarray data in this way is
commonplace (for example, when making ‘MA’ plots). To achieve this we first define a
small helper function log2Ratio which will accept two input data arrays and give back the
combined, comparison array, noting that we take a copy of the original data and add a
small amount to each array, to ensure that we do not divide by zeros or take logarithms of
zero:
from numpy import log2
def log2Ratio(data1, data2):
data1 = array(data1) + 1e-3
data2 = array(data2) + 1e-3
return log2(data1/data2)
We can use this function to combine the red and green channels (index 0 and 1), placing
the result in the blue channel (index 2).
rgArray = loadArrayImage(imgFile, 'TwoChannel', 18, 17)
rgArray.combineChannels(0, 1, combFunc=log2Ratio, replace=2)
The result can be visualised by selecting only the blue channel, here making a greyscale
image.
rgArray.makeImage(20, channels=(2,2,2)).show()
A further alternative to show differences between the two channel intensities is to use
logarithms to calculate x*log(x/y), where x and y are two colour channels, which is a
convenient way of showing the information content
5
of one distribution over another and
which is used in the G-test (see
Chapter 22
). Similar to the previous example red and
green channel arrays are both shifted away from zero by a small amount, so that zeros do
not occur in the division that will follow.
from numpy import log2
def gScore(data1, data2):
data1 = array(data1) + 1e-3
data2 = array(data2) + 1e-3
return data1 * log2(data1/data2)
This can be tested as before, though we normalise the values so the logarithms are
scaled into the same range as the other channels:
rgArray = loadArrayImage(imgFile, 'TwoChannel', 18, 17)
rgArray.combineChannels(0, 1, combFunc=gScore, replace=2)
rgArray.normaliseMax(perChannel=True)
rgArray.makeImage(20, channels=(2,2,2)).show()
With values calculated in this way we can perform significance tests, as described in
Chapter 22
, given that the random expectation is that values will be chi-square distributed.
Here the comparative term has been calculated from the perspective of the red channel,
but we could also do it for green (i.e. green * log(green/red)).
Do'stlaringiz bilan baham: |