Extracting array image data
The next example reads raw array data from a pixelated image, i.e. a picture of the whole
array, which contains separate layers of data recorded, or at least stored, as separate colour
channels. Each colour channel records a separate signal for the same spots, given that two
samples were labelled with different fluorescent dyes that can be assayed independently.
The input file is read as a pixmap image that contains red, green and blue colour
channels (RGB) using functionality that is discussed more fully in
Chapter 18
. Because
we are dealing with image data we import from the Python Imaging Library (PIL),
2
as
discussed in
Chapter 18
(so you may like to skip ahead to learn more about images) to
handle all the tricky tasks of making image pixmaps and saving the image data to a file. It
should be noted that this is not part of the standard Python library and must be installed
separately. Also, we import a function imageToPixmapRGB from the Images module (part
of the downloadable data that accompanies this book) that will convert the image data into
a NumPy array. And as you may expect we import some NumPy functions to manipulate
numeric array data.
from PIL import Image
from Images import imageToPixmapRGB
from numpy import array, dstack, transpose, uint8, zeros
The array import function itself takes the name of the file to load and a human-readable
name for the data. Also we specify the number of rows and columns (optionally if
different from the rows) that the image represents. While it is certainly possible to do
image processing to guess where the circular spots in the array image are located it is far
easier to specify the grid size upfront and then simply subdivide the image into equally
sized rectangles, corresponding to the rows and columns. Here we will simply take the
signal for each spot as the total amount of signal within each grid cell, though this could
be refined by fitting circles and removing noise etc.
def loadArrayImage(fileName, sampleName, nRows, nCols=None):
If the number of data columns was not specified when the function is called we set it to
be equal to the number of rows. The numeric matrix that will contain the signal
information dataMatrix is then constructed initially as an array for zeros of the required
size, noting that the first axis has three layers before we specify rows and columns (3,
nRows, nCols), which will be used to store the separate colour components. It is a matter
of taste whether the different layers use the first or last axis of the array, but here we put it
first because it makes the code slightly simpler overall, even though this is the opposite of
how the data is stored in the image.
if not nCols:
nCols = nRows
dataMatrix = zeros((3, nRows, nCols), float)
Using the imported modules, an object representing the image is generated from the
input file with the Image.open() method, and this is them converted to a numeric array
with the function from
Chapter 18
.
img = Image.open(fileName) # Automatic file type
pixmap = imageToPixmapRGB(img)
The size of the pixel data along each of its axes is easily determined from the numeric
array. By dividing the total image width and height by the number of columns and rows
respectively we get a measure of the grid size, which we will use to subdivide the image
data. We calculate both floating point grid sizes (dx, dy) and integer sizes (xSize, ySize)
because we need precise values to define the grid start points but a fixed number of pixels
to find the end points, and thus give blocks of equal area. Note the integer size calculation
involves adding one pixel because we will be taking a slice out of the image array up to,
but not including, the end value, but that this also means we subtract one prior to division
to avoid overshooting the edge of the image.
height, width, depth = pixmap.shape
dx = width/float(nCols) # float() not needed in Python 3
dy = height/float(nRows)
xSize = 1 + (width-1)//nCols
ySize = 1 + (height-1)//nRows
Looping through each microarray row the first pixel position for that image section
(yStart) is calculated by multiplying the row number by the row depth in the image (dy)
and converting to an integer. The last pixel position will be just inside the limit (yEnd),
which is calculated as the start plus the integer grid width (ySize).
for row in range(nRows):
yStart = int(row*dy)
yEnd = yStart + ySize
Similarly, within each row we calculate the range of pixels to select a column of data
from the image.
for col in range(nCols):
xStart = int(col*dx)
xEnd = xStart + xSize
The data corresponding to an individual microarray grid element (i.e. spot) is a
rectangular region of pixels sliced from the image pixmap, using the row and column
bounds just calculated. The data from this sub-region is summed along both the width and
height axes of the array (but not colour axis), hence we use .sum(axis=(0,1)) to give the
total signal for the grid element. This is then stored in dataMatrix at the required row and
column, noting that the ‘:’ specification for the first axis of the array means that we are
setting all the colour channels at the same time.
elementData = pixmap[yStart:yEnd,xStart:xEnd]
dataMatrix[:,row, col] = elementData.sum(axis=(0,1))
Note that if width is not a multiple of nCols then the last column has fewer pixels in the
sum, and similarly for the last row, if height is not a multiple of nRows. Finally at the end
of the function we create a Microarray object, as described below, with its name and data
array.
return Microarray(sampleName, dataMatrix)
Do'stlaringiz bilan baham: |