A Python neural network
The feed-forward neural network example in Python has been split into two functions: one
that makes predictions and one that does the training. It would also be possible to
construct this neural network using classes (custom kinds of Python objects), and this may
hold certain advantages, like the ability to make adapted subclasses. However, using
functions makes it simpler to describe the principles of what is happening.
The first function is called neuralNetPredict, which takes some input data for the first
layer of network nodes, applies the first weighted connections and trigger functions to
pass the signal to the hidden layer of nodes and then applies the second weights and
triggers to generate some output. This is used both during the training of the network, to
set up the connection weights, and to make predictions on unseen data. Initially some
mathematical functions are imported from the NumPy library, so that we can express the
operations concisely as arrays and matrices.
11
from numpy import array, tanh, zeros, ones, random, sum, append
Then we define the function name and its input arguments: an array of input features
(inputVec) and two matrices that represent the connection weights. The matrix weightsIn
represents the strength of connection between the input nodes (which include the bias
node we describe below) and the hidden nodes. Likewise, weightsOut represents the
strengths between the hidden and the output nodes. The weights are represented as
matrices so that the rows correspond to a set of nodes in one layer and the columns
represent the set of nodes in the other layer, to connect everything in one layer to
everything in the other. For example, if the network has four input, five hidden and two
output nodes, then weightsIn will be a 4×5 matrix, and weightsOut will be a 5×2 matrix.
Inside the function the first step is to define the signalIn vector for the network. This is
simply a copy of the input features array with an extra value of 1.0 appended to the end.
This extra, fixed input is what is known as a bias node, and is present so the baseline (the
level without meaningful signal) of an input can be adjusted. This gives more flexibility at
the trigger function used for the hidden layer of nodes, which improves learning. The
weight matrices must be of the right size to account for the bias node, and although
weights from the bias node are still adjusted by training they are naturally not affected by
the input data. A bias connection going to each hidden node enables the input to that node
to be offset, effectively shifting the centre of the trigger function about so that it can better
distinguish the input values; the upshot of this is that the programmer doesn’t have to
worry about centring input feature values (e.g. making their mean values zero).
def neuralNetPredict(inputVec, weightsIn, weightsOut):
signalIn = append(inputVec, 1.0) # input layer
prod = signalIn * weightsIn.T
sums = sum(prod, axis=1)
signalHid = tanh(sums) # hidden layer
prod = signalHid * weightsOut.T
sums = sum(prod, axis=1)
signalOut = tanh(sums) # output layer
return signalIn, signalHid, signalOut
The main operation of the function involves multiplying the input vector, element by
element, with the columns of the first matrix of weights. As a result of the training process
we describe later, the weight matrix is arranged so that there is a column for each of the
hidden nodes. Given we want to apply the input signal to each hidden node, we use the
transpose (.T) of the weight matrix so that columns are switched with rows for the
multiplication. This is a requirement because element multiplication of a one-dimensional
NumPy array with a two-dimensional array is done on a per-row basis. Next we calculate
the summation of the weighted input down each column (axis=1), so we get one value for
each hidden node. Then to get the signal that comes from the hidden layer we calculate the
hyperbolic tangent of the sums, applying the sigmoid-shaped trigger function to each. This
whole operation is then repeated in the same manner for going from the hidden layer to
the output layer; we apply weights to the signal vector, sum over columns and apply the
trigger function. The final output vector is the prediction from the network. At the end of
the function we return all the signal vectors, and although only the output values are useful
in making predictions the other vectors are used in training the network.
The second Python function for the feed-forward neural network is a function to train it
by the back-propagation method, to find an optimal pair of weight matrices. The objective
is to minimise error between the output vectors predicted by the network and the target
values (known because this is training data). Here the error is calculated as the sum of the
squared differences, but other methods may be more appropriate in certain situations. The
function is defined and takes the training data as an argument, which is expected to be an
array containing pairs of items: an input feature vector and the known output vector. The
next argument is the number of nodes in the hidden layer; the size of input and output
layers need not be specified because they can be deduced from the length of the input and
output vectors used in training. The remaining arguments relate to the number of training
steps (cycles over the data) that will be made, a value for the learning rate that governs
how strongly weights are adjusted and a momentum factor that allows each training cycle
to use a fraction of the adjustments that were used in the previous cycle, which makes for
smoother training. In practice the learning rate and momentum factor can be optimised,
but the default values are generally a fair start.
def neuralNetTrain(trainData, numHid, steps=100, rate=0.5, momentum=0.2):
Within the function a few values are initialised. The numbers of nodes in the input and
output layers are extracted from the size of the first item (index zero) of training data,
noting that the number of inputs is then increased by one to accommodate the bias node.
The error value which we aim to minimise starts as None, but will be filled with numeric
values later.
numInp = len(trainData[0][0])
numOut = len(trainData[0][1])
numInp += 1
minError = None
Next we make the initial signal vectors as arrays of the required sizes (a value comes
from each node) with all elements starting out as 1 courtesy of numpy.ones(). The input
will be the feature vector we pass in and the output will be the prediction.
sigInp = ones(numInp)
sigHid = ones(numHid)
sigOut = ones(numOut)
The initial weight matrices are constructed with random values between −0.5 and 0.5,
with the required number of rows and columns in each. The random.random function
makes matrices of random numbers in the range 0.0 to 1.0, but by taking 0.5 away (from
every element) we shift this range. This particular range is not a strict requirement, but is a
fairly good general strategy; too small and the network can get stuck, but too large and the
learning is stifled. The best weight matrices, which is what we are going to pass back from
the function at the end of training, start as these initial weights but then improve.
wInp = random.random((numInp, numHid))-0.5
wOut = random.random((numHid, numOut))-0.5
bestWeightMatrices = (wInp, wOut)
The next initialisation is for the change matrices, which will indicate how much the
weight matrices differ from one training cycle to the next. These are important so that
there is a degree of memory or momentum in the training; strong corrections to the
weights will tend to keep going and help convergence.
cInp = zeros((numInp, numHid))
cOut = zeros((numHid, numOut))
The final initialisation is for the training data: pairs of input and output vectors. This is
done to convert all of the vectors into numpy.array data type, thus allowing the training
data to be input as lists and/or tuples. We simply loop through the data, extract each pair,
convert to arrays and then put the pair back in the list at the appropriate index (x).
for x, (inputs, knownOut) in enumerate(trainData):
trainData[x] = (array(inputs), array(knownOut))
With everything initialised, we can then begin the actual network training, so we go
through the required number of loops and in Python 2 use xrange() so that a large list
doesn’t have to be created. Note we don’t use a while loop to check for convergence on
the error because a neural network is not always guaranteed to converge and sometimes it
can stall before convergence. For each step we shuffle the training data, which is often
very important for training; without this there is a bias in the way the weights get
optimised. After the shuffle, the error starts at zero for the cycle.
for step in range(steps): # xrange() in Python 2
random.shuffle(trainData) # Important
error = 0.0
Next we loop through all of the training data, getting the input feature vector and
known output for each example. We then use the current values of the weight matrices,
with the prediction function described above, to calculate the signal vectors. Initially the
output signal vector (the prediction) will be quite different from the known output vector,
but this will hopefully improve over time.
for inputs, knownOut in trainData:
sigIn, sigHid, sigOut = neuralNetPredict(inputs, wInp, wOut)
Given the neural network signals that come from the current estimates for weight
matrices we now apply the back-propagation method to try to reduce the error in the
prediction. Thus we calculate the difference between the known output vector and the
signal output from the neural network. This difference is squared and summed up over all
the features (diff is an array) before being added to the total error for this cycle.
diff = knownOut - sigOut
error += sum(diff * diff)
Next we work out an adjustment that will be made to the output weights, to hopefully
reduce the error. The adjustment is calculated from the gradient of the trigger function.
Because this example uses a hyperbolic tangent function, the gradient at the signal value is
one minus the signal value squared (differentiate y = tanh(x) and you get 1 − tanh
2
(x)
which equals 1 − y
2
). The signal gradient multiplied by the signal difference then
represents the change in the signal before the trigger function, which can be used to adjust
the weight matrices. Note that all these mathematical operations are performed on all the
elements of whole arrays at once, courtesy of NumPy.
gradient = ones(numOut) - (sigOut*sigOut)
outAdjust = gradient * diff
The same kind of operation is repeated for the hidden layer, to find the adjustment that
will be made for the input weight matrix. Again, we calculate a signal difference and a
trigger function gradient and multiply them to get an adjustment for what goes into the
trigger function. However, this time we can’t compare output vectors, so instead we take
the array of signal adjustments just calculated and propagate them back through the
network. Thus the signal difference for the hidden layer is calculated by taking the signal
adjustment for the output later and passing it through the output weight matrix, i.e.
backwards through the last layer.
diff = sum(outAdjust * wOut, axis=1)
gradient = ones(numHid) - (sigHid*sigHid)
hidAdjust = gradient * diff
With the adjustments calculated it then remains to make the changes to the weight
matrices, and hopefully get an improvement in the error. The weight change going from
hidden to output layers requires that we calculate a change matrix (the same size as the
weights), hence we take the vector of adjustments and the vector of hidden signals and
combine them; each row of adjustments (one per output) is multiplied by a column of
signals (one per hidden node) to get the new weights. Note how we use the reshape()
function to convert the array of signals, a single row, into a column vector; it is tipped on
its side so that the multiplication can be made to generate a matrix with rows and columns.
# update output
change = outAdjust * sigHid.reshape(numHid, 1)
wOut += (rate * change) + (momentum * cOut)
cOut = change
In the same manner the changes are made to the input weight matrix.
# update input
change = hidAdjust * sigIn.reshape(numInp, 1)
wInp += (rate * change) + (momentum * cInp)
cInp = change
Then finally in the training cycle, we see if the minimum error has been improved on.
During the first cycle the minimum error is None, so we always fill it with the first real
calculated error value in that case. Each time we find a new minimum error we record the
best weight matrices (so far) by taking copies of the current versions, using the handy
.copy() function of NumPy arrays. Then finally at the end of all of the training cycles, the
best weight matrices are returned.
if (minError is None) or (error < minError):
minError = error
bestWeightMatrices = (wInp.copy(), wOut.copy())
print("Step: %d Error: %f" % (step, error))
return bestWeightMatrices
We can test the feed-forward neural network by using some test training data. As a very
simple example, the first test takes input vectors with a pair of numbers which are either
one or zero. The output corresponds to the ‘exclusive or’ (XOR) logic function: the output
is 1 if either of the inputs is 1, but not both. This test data is a list of [input, output] pairs.
Note that even though the output is just a single number it is nonetheless represented as a
list with a single item.
data = [[[0,0], [0]],
[[0,1], [1]],
[[1,0], [1]],
[[1,1], [0]]]
The number of hidden nodes used here is simply stated as 2, but in practical situations
several values will need to be tried, and their performance evaluated. Then we run the
training function in the data to estimate the best weight matrices for the neural network.
wMatrixIn, wMatrixOut = neuralNetTrain(data, 2, 1000)
The output weight matrices can then be run on test data for evaluation. At the very least
they ought to do a reasonable job at predicting the output signals for the training set,
although in practice these really ought to be for data that has not been used in the training.
for inputs, knownOut in data:
sIn, sHid, sOut = neuralNetPredict(array(inputs), wMatrixIn, wMatrixOut)
print(knownOut, sOut[0])
Do'stlaringiz bilan baham: |