Python Programming for Biology: Bioinformatics and Beyond

Extracting array image data

Download 7,75 Mb.

Pdf ko'rish

bet	242/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 238 239 240 241 242 243 244 245 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Extracting array image data

The next example reads raw array data from a pixelated image, i.e. a picture of the whole

array, which contains separate layers of data recorded, or at least stored, as separate colour

channels. Each colour channel records a separate signal for the same spots, given that two

samples were labelled with different fluorescent dyes that can be assayed independently.

The input file is read as a pixmap image that contains red, green and blue colour

channels (RGB) using functionality that is discussed more fully in

Chapter 18

. Because

we are dealing with image data we import from the Python Imaging Library (PIL),

discussed in

Chapter 18

(so you may like to skip ahead to learn more about images) to

handle all the tricky tasks of making image pixmaps and saving the image data to a file. It

should be noted that this is not part of the standard Python library and must be installed

separately. Also, we import a function imageToPixmapRGB from the Images module (part

of the downloadable data that accompanies this book) that will convert the image data into

a NumPy array. And as you may expect we import some NumPy functions to manipulate

numeric array data.

from PIL import Image

from Images import imageToPixmapRGB

from numpy import array, dstack, transpose, uint8, zeros

The array import function itself takes the name of the file to load and a human-readable

name for the data. Also we specify the number of rows and columns (optionally if

different from the rows) that the image represents. While it is certainly possible to do

image processing to guess where the circular spots in the array image are located it is far

easier to specify the grid size upfront and then simply subdivide the image into equally

sized rectangles, corresponding to the rows and columns. Here we will simply take the

signal for each spot as the total amount of signal within each grid cell, though this could

be refined by fitting circles and removing noise etc.

def loadArrayImage(fileName, sampleName, nRows, nCols=None):

If the number of data columns was not specified when the function is called we set it to

be equal to the number of rows. The numeric matrix that will contain the signal

information dataMatrix is then constructed initially as an array for zeros of the required

size, noting that the first axis has three layers before we specify rows and columns (3,

nRows, nCols), which will be used to store the separate colour components. It is a matter

of taste whether the different layers use the first or last axis of the array, but here we put it

first because it makes the code slightly simpler overall, even though this is the opposite of

how the data is stored in the image.

if not nCols:

nCols = nRows

dataMatrix = zeros((3, nRows, nCols), float)

Using the imported modules, an object representing the image is generated from the

input file with the Image.open() method, and this is them converted to a numeric array

with the function from

Chapter 18

img = Image.open(fileName) # Automatic file type

pixmap = imageToPixmapRGB(img)

The size of the pixel data along each of its axes is easily determined from the numeric

array. By dividing the total image width and height by the number of columns and rows

respectively we get a measure of the grid size, which we will use to subdivide the image

data. We calculate both floating point grid sizes (dx, dy) and integer sizes (xSize, ySize)

because we need precise values to define the grid start points but a fixed number of pixels

to find the end points, and thus give blocks of equal area. Note the integer size calculation

involves adding one pixel because we will be taking a slice out of the image array up to,

but not including, the end value, but that this also means we subtract one prior to division

to avoid overshooting the edge of the image.

height, width, depth = pixmap.shape

dx = width/float(nCols) # float() not needed in Python 3

dy = height/float(nRows)

xSize = 1 + (width-1)//nCols

ySize = 1 + (height-1)//nRows

Looping through each microarray row the first pixel position for that image section

(yStart) is calculated by multiplying the row number by the row depth in the image (dy)

and converting to an integer. The last pixel position will be just inside the limit (yEnd),

which is calculated as the start plus the integer grid width (ySize).

for row in range(nRows):

yStart = int(row*dy)

yEnd = yStart + ySize

Similarly, within each row we calculate the range of pixels to select a column of data

from the image.

for col in range(nCols):

xStart = int(col*dx)

xEnd = xStart + xSize

The data corresponding to an individual microarray grid element (i.e. spot) is a

rectangular region of pixels sliced from the image pixmap, using the row and column

bounds just calculated. The data from this sub-region is summed along both the width and

height axes of the array (but not colour axis), hence we use .sum(axis=(0,1)) to give the

total signal for the grid element. This is then stored in dataMatrix at the required row and

column, noting that the ‘:’ specification for the first axis of the array means that we are

setting all the colour channels at the same time.

elementData = pixmap[yStart:yEnd,xStart:xEnd]

dataMatrix[:,row, col] = elementData.sum(axis=(0,1))

Note that if width is not a multiple of nCols then the last column has fewer pixels in the

sum, and similarly for the last row, if height is not a multiple of nRows. Finally at the end

of the function we create a Microarray object, as described below, with its name and data

array.

return Microarray(sampleName, dataMatrix)

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 238 239 240 241 242 243 244 245 ... 514