A microarray is a means of performing many small-scale experiments on a sample at the
same time. These experiments will all be of the same kind, i.e. have the same experimental
design, but individual experiments will have different conditions or components. On the
whole these experiments will be physically arranged as spots in a rectangular grid on a
they cannot mix. The basic reason for doing things in this manner is to make things
quicker and easier. Lots of small experiments are performed at the same time, requiring a
proportionately small amount of sample and providing the same set of conditions for each
test (or at least very similar; there can be inhomogeneities across the array). Naturally, the
outcome of the experiments has to be detected at the end and the final state of the
microarray is generally measured using optical methods. Most microarrays are designed to
detect the binding of components from a sample, to the different targets in the array, by
using fluorescence. Here the binding causes an element of the array to glow when
irradiated with UV light. In terms of computing, what is important is that we know what
distinguishes the components of the different miniature experiments within the array, and
then at the end how much signal, e.g. fluorescence, is detected from each.
The actual solid support for the array of experiments is typically made of glass, plastic
or silicon and the experimental components are chemically bonded to its surface in a small
regular array (placed there by machine). The components are generally bio-molecules,
such as DNA, protein or even glycans (poly-sugars), but could also be samples of cells
(i.e. a tissue microarray) or even small molecules. In the case of DNA microarrays the
DNA strands of differing sequences are immobilised, with one sequence to each spot, on
the solid matrix and bind to complementary single-stranded nucleotides, i.e. they hybridise
through base-pair interactions. The samples that are applied to such an array will contain
mixtures of fluorescent-labelled DNA strands, so that those with sequences that are
complementary to the spots hybridise, to cause that part of the array to emit a certain
colour of light visible when illuminated with UV light. Naturally, to know which DNA
sequences have been detected in this way requires that the sequence of each spot in the
array is known. For protein microarrays the situation is similar but the array spots are
immobilised proteins, commonly antibodies, which detect other molecules in a specific
manner via non-covalent interactions.
Whatever the type of array component (be it DNA, protein or whatever) and however it
is detected we will use the same basic Python data structure to represent all kinds of
microarray; they all have an array of spot elements in a rectangular grid and they all are
detected by means of some kind of scalar signal. Although this abstract description can be
applied in several situations, it could naturally be customised or extended for more
specialised purposes. It should be noted that we have chosen to associate the array with
parallel data layers, e.g. for red and green fluorescence channels, which is commonplace
for microarrays. Accordingly an array element may be associated with several different
signal values and the system is flexible, so that we can describe anything from a single
array of values to multiple layers representing different kinds of both processed and
unprocessed data.
What we gain from microarrays is a measure of interaction or reaction for each of the
spots in the array. An array will tell us how strongly a given sample interacts with each
spot component. One of the Python examples, to do hierarchical clustering, will analyse
this further to show similarities within a microarray. This is a common process to visually
indicate similarities between rows and columns. Here we can look back to some of the
phylogenetic tree-building code and borrow a function, illustrating the usefulness of
keeping Python functions abstract and general. We also show how you can look for
similarities and differences in array data, for example by comparing different colour
channels. In such circumstances it can be important to use controls and normalise the data,
to test whether the detection worked equally well in each case and to remove systematic
error. With this in mind several of the following examples are based on normalisation
techniques which will allow the comparison of different data arrays even if the overall