Training a neural network by back propagation
The artificial neural network presented here is trained via a mechanism known as back
propagation. This is a fairly efficient general solution for training, but other ways of
finding network connection weights are possible, like the slower but more rigorous
Markov chain Monte Carlo (see
Chapter 25
). The back-propagation mechanism takes the
known output values for the input training data and adjusts the connection weighting
between the nodes, working backward layer by layer from output, via hidden to input. The
objective at each stage is to minimise the error between the fixed, known result and the
actual network output (the prediction). The weights are adjusted a little to minimise the
error of each bit of training data in turn, although it is often important to randomise the
order of the data. Because different examples of training data may compete with one
another (pull weights in different directions) and because a given node is influenced by
many others we can really only guess at how to adjust the weights to make output match.
Hence training can be a slow and cautious process, repeatedly going through all the
training data many times, while the connection weights settle into a hopefully stable
pattern. The actual amount that weights are adjusted for each bit of data in each cycle will
naturally depend on the kind of trigger function used by the nodes, but in general the idea
is that the gradient of the function indicates in which direction the inputs to a node should
be adjusted to better match the output.
The programmer should always be cautious when training an artificial neural network,
and it can only legitimately be used to make predictions if the performance is properly
tested on data that it has never seen before; it is commonplace to hold back some of the
training data set for testing. Also, these networks can suffer from over-training, where the
network learns to associate the training input and output too well; it becomes too
specialised and performs poorly on data it has not seen before. Over-training can be
minimised by selecting a widely spread set of training examples, optimising performance
by testing on some data that has never been seen before, and not worrying too much about
small improvements in the connection weight optimisation. Even considering these things
though, the user also has to be mindful, as with any machine learning, that the problem
being addressed is well formulated. There is the anecdotal example of the military neural
network that was designed to automatically distinguish between pictures of friendly and
enemy tanks. In training, this neural network seemed to work very well, but in the real
world it performed poorly. It turned out that pictures of friendly and enemy tanks
generally had different kinds of backgrounds and the network has learned the
classification based upon the (easier to distinguish) terrain type, not on the tanks. Putting
an enemy tank in front of some trees made it look friendly, at least as far as the neural
network was concerned. The moral here is to only use input that is unbiased and relevant.
Do'stlaringiz bilan baham: |