Python Programming for Biology: Bioinformatics and Beyond

Download 7,75 Mb.

Pdf ko'rish

bet	403/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 399 400 401 402 403 404 405 406 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

A Python neural network

The feed-forward neural network example in Python has been split into two functions: one

that makes predictions and one that does the training. It would also be possible to

construct this neural network using classes (custom kinds of Python objects), and this may

hold certain advantages, like the ability to make adapted subclasses. However, using

functions makes it simpler to describe the principles of what is happening.

The first function is called neuralNetPredict, which takes some input data for the first

layer of network nodes, applies the first weighted connections and trigger functions to

pass the signal to the hidden layer of nodes and then applies the second weights and

triggers to generate some output. This is used both during the training of the network, to

set up the connection weights, and to make predictions on unseen data. Initially some

mathematical functions are imported from the NumPy library, so that we can express the

operations concisely as arrays and matrices.

from numpy import array, tanh, zeros, ones, random, sum, append

Then we define the function name and its input arguments: an array of input features

(inputVec) and two matrices that represent the connection weights. The matrix weightsIn

represents the strength of connection between the input nodes (which include the bias

node we describe below) and the hidden nodes. Likewise, weightsOut represents the

strengths between the hidden and the output nodes. The weights are represented as

matrices so that the rows correspond to a set of nodes in one layer and the columns

represent the set of nodes in the other layer, to connect everything in one layer to

everything in the other. For example, if the network has four input, five hidden and two

output nodes, then weightsIn will be a 4×5 matrix, and weightsOut will be a 5×2 matrix.

Inside the function the first step is to define the signalIn vector for the network. This is

simply a copy of the input features array with an extra value of 1.0 appended to the end.

This extra, fixed input is what is known as a bias node, and is present so the baseline (the

level without meaningful signal) of an input can be adjusted. This gives more flexibility at

the trigger function used for the hidden layer of nodes, which improves learning. The

weight matrices must be of the right size to account for the bias node, and although

weights from the bias node are still adjusted by training they are naturally not affected by

the input data. A bias connection going to each hidden node enables the input to that node

to be offset, effectively shifting the centre of the trigger function about so that it can better

distinguish the input values; the upshot of this is that the programmer doesn’t have to

worry about centring input feature values (e.g. making their mean values zero).

def neuralNetPredict(inputVec, weightsIn, weightsOut):

signalIn = append(inputVec, 1.0) # input layer

prod = signalIn * weightsIn.T

sums = sum(prod, axis=1)

signalHid = tanh(sums) # hidden layer

prod = signalHid * weightsOut.T

sums = sum(prod, axis=1)

signalOut = tanh(sums) # output layer

return signalIn, signalHid, signalOut

The main operation of the function involves multiplying the input vector, element by

element, with the columns of the first matrix of weights. As a result of the training process

we describe later, the weight matrix is arranged so that there is a column for each of the

hidden nodes. Given we want to apply the input signal to each hidden node, we use the

transpose (.T) of the weight matrix so that columns are switched with rows for the

multiplication. This is a requirement because element multiplication of a one-dimensional

NumPy array with a two-dimensional array is done on a per-row basis. Next we calculate

the summation of the weighted input down each column (axis=1), so we get one value for

each hidden node. Then to get the signal that comes from the hidden layer we calculate the

hyperbolic tangent of the sums, applying the sigmoid-shaped trigger function to each. This

whole operation is then repeated in the same manner for going from the hidden layer to

the output layer; we apply weights to the signal vector, sum over columns and apply the

trigger function. The final output vector is the prediction from the network. At the end of

the function we return all the signal vectors, and although only the output values are useful

in making predictions the other vectors are used in training the network.

The second Python function for the feed-forward neural network is a function to train it

by the back-propagation method, to find an optimal pair of weight matrices. The objective

is to minimise error between the output vectors predicted by the network and the target

values (known because this is training data). Here the error is calculated as the sum of the

squared differences, but other methods may be more appropriate in certain situations. The

function is defined and takes the training data as an argument, which is expected to be an

array containing pairs of items: an input feature vector and the known output vector. The

next argument is the number of nodes in the hidden layer; the size of input and output

layers need not be specified because they can be deduced from the length of the input and

output vectors used in training. The remaining arguments relate to the number of training

steps (cycles over the data) that will be made, a value for the learning rate that governs

how strongly weights are adjusted and a momentum factor that allows each training cycle

to use a fraction of the adjustments that were used in the previous cycle, which makes for

smoother training. In practice the learning rate and momentum factor can be optimised,

but the default values are generally a fair start.

def neuralNetTrain(trainData, numHid, steps=100, rate=0.5, momentum=0.2):

Within the function a few values are initialised. The numbers of nodes in the input and

output layers are extracted from the size of the first item (index zero) of training data,

noting that the number of inputs is then increased by one to accommodate the bias node.

The error value which we aim to minimise starts as None, but will be filled with numeric

values later.

numInp = len(trainData[0][0])

numOut = len(trainData[0][1])

numInp += 1

minError = None

Next we make the initial signal vectors as arrays of the required sizes (a value comes

from each node) with all elements starting out as 1 courtesy of numpy.ones(). The input

will be the feature vector we pass in and the output will be the prediction.

sigInp = ones(numInp)

sigHid = ones(numHid)

sigOut = ones(numOut)

The initial weight matrices are constructed with random values between −0.5 and 0.5,

with the required number of rows and columns in each. The random.random function

makes matrices of random numbers in the range 0.0 to 1.0, but by taking 0.5 away (from

every element) we shift this range. This particular range is not a strict requirement, but is a

fairly good general strategy; too small and the network can get stuck, but too large and the

learning is stifled. The best weight matrices, which is what we are going to pass back from

the function at the end of training, start as these initial weights but then improve.

wInp = random.random((numInp, numHid))-0.5

wOut = random.random((numHid, numOut))-0.5

bestWeightMatrices = (wInp, wOut)

The next initialisation is for the change matrices, which will indicate how much the

weight matrices differ from one training cycle to the next. These are important so that

there is a degree of memory or momentum in the training; strong corrections to the

weights will tend to keep going and help convergence.

cInp = zeros((numInp, numHid))

cOut = zeros((numHid, numOut))

The final initialisation is for the training data: pairs of input and output vectors. This is

done to convert all of the vectors into numpy.array data type, thus allowing the training

data to be input as lists and/or tuples. We simply loop through the data, extract each pair,

convert to arrays and then put the pair back in the list at the appropriate index (x).

for x, (inputs, knownOut) in enumerate(trainData):

trainData[x] = (array(inputs), array(knownOut))

With everything initialised, we can then begin the actual network training, so we go

through the required number of loops and in Python 2 use xrange() so that a large list

doesn’t have to be created. Note we don’t use a while loop to check for convergence on

the error because a neural network is not always guaranteed to converge and sometimes it

can stall before convergence. For each step we shuffle the training data, which is often

very important for training; without this there is a bias in the way the weights get

optimised. After the shuffle, the error starts at zero for the cycle.

for step in range(steps): # xrange() in Python 2

random.shuffle(trainData) # Important

error = 0.0

Next we loop through all of the training data, getting the input feature vector and

known output for each example. We then use the current values of the weight matrices,

with the prediction function described above, to calculate the signal vectors. Initially the

output signal vector (the prediction) will be quite different from the known output vector,

but this will hopefully improve over time.

for inputs, knownOut in trainData:

sigIn, sigHid, sigOut = neuralNetPredict(inputs, wInp, wOut)

Given the neural network signals that come from the current estimates for weight

matrices we now apply the back-propagation method to try to reduce the error in the

prediction. Thus we calculate the difference between the known output vector and the

signal output from the neural network. This difference is squared and summed up over all

the features (diff is an array) before being added to the total error for this cycle.

diff = knownOut - sigOut

error += sum(diff * diff)

Next we work out an adjustment that will be made to the output weights, to hopefully

reduce the error. The adjustment is calculated from the gradient of the trigger function.

Because this example uses a hyperbolic tangent function, the gradient at the signal value is

one minus the signal value squared (differentiate y = tanh(x) and you get 1 − tanh

(x)

which equals 1 − y

). The signal gradient multiplied by the signal difference then

represents the change in the signal before the trigger function, which can be used to adjust

the weight matrices. Note that all these mathematical operations are performed on all the

elements of whole arrays at once, courtesy of NumPy.

gradient = ones(numOut) - (sigOut*sigOut)

outAdjust = gradient * diff

The same kind of operation is repeated for the hidden layer, to find the adjustment that

will be made for the input weight matrix. Again, we calculate a signal difference and a

trigger function gradient and multiply them to get an adjustment for what goes into the

trigger function. However, this time we can’t compare output vectors, so instead we take

the array of signal adjustments just calculated and propagate them back through the

network. Thus the signal difference for the hidden layer is calculated by taking the signal

adjustment for the output later and passing it through the output weight matrix, i.e.

backwards through the last layer.

diff = sum(outAdjust * wOut, axis=1)

gradient = ones(numHid) - (sigHid*sigHid)

hidAdjust = gradient * diff

With the adjustments calculated it then remains to make the changes to the weight

matrices, and hopefully get an improvement in the error. The weight change going from

hidden to output layers requires that we calculate a change matrix (the same size as the

weights), hence we take the vector of adjustments and the vector of hidden signals and

combine them; each row of adjustments (one per output) is multiplied by a column of

signals (one per hidden node) to get the new weights. Note how we use the reshape()

function to convert the array of signals, a single row, into a column vector; it is tipped on

its side so that the multiplication can be made to generate a matrix with rows and columns.

# update output

change = outAdjust * sigHid.reshape(numHid, 1)

wOut += (rate * change) + (momentum * cOut)

cOut = change

In the same manner the changes are made to the input weight matrix.

# update input

change = hidAdjust * sigIn.reshape(numInp, 1)

wInp += (rate * change) + (momentum * cInp)

cInp = change

Then finally in the training cycle, we see if the minimum error has been improved on.

During the first cycle the minimum error is None, so we always fill it with the first real

calculated error value in that case. Each time we find a new minimum error we record the

best weight matrices (so far) by taking copies of the current versions, using the handy

.copy() function of NumPy arrays. Then finally at the end of all of the training cycles, the

best weight matrices are returned.

if (minError is None) or (error < minError):

minError = error

bestWeightMatrices = (wInp.copy(), wOut.copy())

print("Step: %d Error: %f" % (step, error))

return bestWeightMatrices

We can test the feed-forward neural network by using some test training data. As a very

simple example, the first test takes input vectors with a pair of numbers which are either

one or zero. The output corresponds to the ‘exclusive or’ (XOR) logic function: the output

is 1 if either of the inputs is 1, but not both. This test data is a list of [input, output] pairs.

Note that even though the output is just a single number it is nonetheless represented as a

list with a single item.

data = [[[0,0], [0]],

[[0,1], [1]],

[[1,0], [1]],

[[1,1], [0]]]

The number of hidden nodes used here is simply stated as 2, but in practical situations

several values will need to be tried, and their performance evaluated. Then we run the

training function in the data to estimate the best weight matrices for the neural network.

wMatrixIn, wMatrixOut = neuralNetTrain(data, 2, 1000)

The output weight matrices can then be run on test data for evaluation. At the very least

they ought to do a reasonable job at predicting the output signals for the training set,

although in practice these really ought to be for data that has not been used in the training.

for inputs, knownOut in data:

sIn, sHid, sOut = neuralNetPredict(array(inputs), wMatrixIn, wMatrixOut)

print(knownOut, sOut[0])

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 399 400 401 402 403 404 405 406 ... 514