Beginning Anomaly Detection Using

Download 26,57 Mb.

Pdf ko'rish

bet	144/283
Sana	12.07.2021
Hajmi	26,57 Mb.
	#116397

1 ... 140 141 142 143 144 145 146 147 ... 283

Bog'liq
Beginning Anomaly Detection Using Python-Based Deep Learning

Figure 6-6.

Figure 6-4. Time series with anomalies

Figure 6-5. A high-level representation of neural networks

Clearly, the neural network processes input and produces output, and this works on

many types of input data with varying features. However, a critical piece to notice is that

this neural network has no notion of the time of the occurrence of the event (input), only

that input has come in.

So what happens with events (input) that come in as a stream over long periods of

time? How can the neural network shown above handle trending in events, seasonality

in events, etc.? How can it learn from the past and apply it to the present and future?

Recurrent neural networks try to address this by incrementally building neural

networks, taking in signals from a previous timestamp into the current network.

Figure

6- 6

shows a RNN.

Chapter 6 Long Short-term memory modeLS

217

You can see that RNN is a neural network with multiple layers or steps or stages.

Each stage represents a time T; the RNN at T+1 will consider the RNN at time T as one

of the signals. Each stage passes its output to the next stage. The hidden state, which is

passed from one stage to next, is the key for the RNN to work so well and this hidden

state is analogous to some sort of memory retention. A RNN layer (or stage) acts as an

encoder as it processes the input sequence and returns its own internal state. This state

serves as the input of the decoder in the next stage, which is trained to predict the next

point of the target sequence, given previous points of the target sequence. Specifically,

it is trained to turn the target sequences into the same sequences but offset by one

timestep in the future.

Backpropagation is used when training a RNN as in other neural networks, but

in RNNs there is also a time dimension. In backpropagation, we take the derivative

(gradient) of the loss with respect to each of the parameters. Using this information

(loss), we can then shift the parameters in the opposite direction with a goal to minimize

the loss. We have a loss at each timestep since we are moving through time and we

can sum the losses across time to get the loss at each timestep. This is the same as

summation of gradients across time.

The problem with the above recurrent neural networks, constructed from regular

neural network nodes, is that as we try to model dependencies between sequence values

that are separated by a significant number of other values, the gradients of timestep

Figure 6-6. A recurrent neural network

Chapter 6 Long Short-term memory modeLS

218

T depends on gradients at T-1, gradients at T-2, and so on. This leads to the earliest

gradient’s contribution getting smaller and smaller as we move along the timesteps

where the chain of gradients gets longer and longer. This is what is known as the

vanishing gradient problem. This means the gradients of those earlier layers will become

smaller and smaller and therefore the network won’t learn long-term dependencies.

RNN becomes biased as a result, only dealing with short-term data points.

LSTM networks are a way of solving this problem with RNNs.

Download 26,57 Mb.

Do'stlaringiz bilan baham:

1 ... 140 141 142 143 144 145 146 147 ... 283