Python Programming for Biology: Bioinformatics and Beyond

Figure 24.4 (Plate 10). Example self-organising map output

Download 7,75 Mb.

Pdf ko'rish

bet	400/514
Sana	30.12.2021
Hajmi	7,75 Mb.
	#91066

1 ... 396 397 398 399 400 401 402 403 ... 514

Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 24.4 (Plate 10). Example self-organising map output. Results from of an

initially random 100×100 colour pixel map (left) and the effect of the self-organising map

on clustering the colours after 1, 10 and 100 iterations.

Feed-forward artificial neural networks

The next machine learning example that we will cover is another kind of artificial neural

network, but this time it is one that will undergo supervised learning. This means that

when the network ‘learns’, it takes input data (more feature vectors) and changes its

internal weights so that it can reproduce a known answer. The supervisory process

whereby the programmer adjusts the network so that it gives the right answer, or as close

to the right answer as possible, for some known data is usually referred to as training.

Naturally, when training a neural network it is important to have as large and as

representative a set of training data as possible. The predictive power comes from the fact

that the neural network can accept input from data that it has not seen before, that was not

used in the training. Predictions can be made for unseen data because inputs that resemble

those that were used during the initial training will give similar outputs. In this regard it

doesn’t actually matter very much what the input or output data represents, the patterns

and connections between them can be learnt nonetheless.

The neural network that we describe below is composed of a series of nodes arranged

into three layers. The prediction of this network will proceed by a feed-forward

mechanism, whereby input (often referred to as ‘signal’) is entered into the first layer of

nodes. This input data is then moved to the middle or hidden layer to which it is

connected, before finally reaching the last output layer of nodes. It is possible to construct

feed-forward networks with more than three layers (i.e. more hidden layers). However,

these can be more difficult to train and it has been shown that for many situations three

layers are sufficient to do everything that more layers can do

(although the number of

nodes will differ). The number of nodes in the three-layer network depends on the

problem being addressed. The number of input nodes represents the size of the input

vector; the value of each feature goes to a different input node. For example, if the input

was a colour with red, green and blue features, there would be three input nodes. If the

input was a DNA sequence composed of four base letters, there would be four input nodes

for each position of the sequence analysed, thus a sequence of length ten would need 40

inputs. The number of output nodes depends on the problem, but there is some flexibility

to represent the data in different ways. For example, if the network is used to predict an

angle then the output could be a single number or it could be the sine and the cosine of the

angle separately. When being used for categorisation, then there would be as many output

nodes as there are categories. If the neural network was instead being used to approximate

a continuous function, then the output will have a variable number of nodes, depending on

how many axes are required. The number of hidden nodes used will depend on the type

and complexity but will normally be optimised to give the best predictions. Numbers

between three and ten are common. The smaller the number of nodes the quicker it is to

optimise the network during training, but the fewer the number of patterns that can be

detected in the data. The optimum number of hidden nodes can often be smaller than the

number of inputs but is usually larger than the number of outputs. A convenient way to

think of things is that the number of hidden nodes represents the complexity

(dimensionality) of the problem, which is not necessarily related to the size of the input or

output.

The three layers of nodes in our feed-forward network will be connected together. Each

node will be connected to all of the others in a neighbouring layer. Thus, each input node

is connected to all hidden nodes; each hidden node is connected to all of the input and

output nodes; and each output to each hidden node. The properties of a neural network

emerge because the strength of the connection between nodes can vary during the learning

process; so some nodes become more or less well connected. If a connection ends up

having a zero weight then its linked nodes are effectively disconnected; thus the network

can represent a large number of possible internal organisations. A node will be connected

to many others to varying degrees, but the actual feed-forward action of the network that

is used to make predictions (generate output) uses what is known as a trigger function to

adjust the response. In essence, a node collects input signals on one side and has to

combine these in some manner to generate output on the other side, which could be an

intermediate or final output signal. The input signals are added together, but the strength

of the resulting output which is sent to any nodes in the next layer is altered. Firstly, the

summation of the combined inputs is scaled to be within certain minimum and maximum

bounds for practical purposes. And secondly, the input is applied to the trigger function to

increase or decrease the effect that certain amounts of input have. Sometimes the trigger

function is a two-state switch where smaller input values produce very little response, but

above a particular threshold the response is very strong; this is perhaps analogous to the

firing of a neuron inside a brain. However, many types of trigger functions are possible,

and the one we employ here is the popular hyperbolic tangent function (tanh; see

Figure

24.5

). Using the sigmoid-shaped hyperbolic tangent curve means that in mid ranges the

strength of a node’s output is roughly proportional to its input, but at the high and low

input extremes the output is attenuated towards limits. This function also benefits from

having an easily calculated gradient (required for training) and has successfully been used

in many diverse situations.

Download 7,75 Mb.

Do'stlaringiz bilan baham:

1 ... 396 397 398 399 400 401 402 403 ... 514