Beginning Anomaly Detection Using

Download 26,57 Mb.

Pdf ko'rish

bet	148/283
Sana	12.07.2021
Hajmi	26,57 Mb.
	#116397

1 ... 144 145 146 147 148 149 150 151 ... 283

Bog'liq
Beginning Anomaly Detection Using Python-Based Deep Learning

Figure 6-11. A sigmoid activation function

Chapter 6 Long Short-term memory modeLS

222

A forget gate is the first part of the LSTM stage and pretty much decides how much

information from a prior stage should be remembered or forgotten. This is accomplished

by passing the previous hidden state hT-1 and current input xT through a sigmoid

function.

The input gate helps decide how much information to pass to current stage by using

the sigmoid function and also a tanh function.

The output gate controls how much information will be retained by the hidden state

of this stage and passed onto the next stage. Again, the current state passes through the

tanh function.

Just for information, the compact forms of the equations for the forward pass of an

LSTM unit with a forget gate are (source : Wikipedia)

f

W x U h

b

i

W x U h

b

o

W x U h

t

g

f

t

f

t

f

t

g

i t

i t

i

t

g

o t

o t

(

)

(

)

(

)

(

)

( )

-

b

c

f c

i

W x U h

b

h

o

c

o

t

t

t

t

c

c t

c t

c

t

t

h

t



where the initial values are c

= 0 and h

= 0, and the operator

denotes the element- wise

product. The subscript indexes the time step.

Figure 6-12. A detailed LSTM network

Source: commons.wikimedia.org

Chapter 6 Long Short-term memory modeLS

223

Variables

• x t

∈ R d {\displaystyle x_{t}\in \mathbb {R} ^{d}} x

t

∈ ℝ

: Input vector

to the LSTM unit

• f t

∈ R h {\displaystyle f_{t}\in \mathbb {R} ^{h}} f

t

∈ ℝ

: Forget gate’s

activation vector

• i t

∈ R h {\displaystyle i_{t}\in \mathbb {R} ^{h}} i

t

∈ ℝ

: Input/update

gate’s activation vector

• o t

∈ R h {\displaystyle o_{t}\in \mathbb {R} ^{h}} o

t

∈ ℝ

: Output

gate’s activation vector

• h t

∈ R h {\displaystyle h_{t}\in \mathbb {R} ^{h}} h

t

∈ ℝ

: Hidden

state vector, also known as the output vector of the LSTM unit

• c t

∈ R h {\displaystyle c_{t}\in \mathbb {R} ^{h}} c

t

∈ ℝ

: Cell state

vector

• W

∈ R h × d {\displaystyle W\in \mathbb {R} ^{h\times d}} W ∈ ℝ

h × d

,

U

∈ ℝ

h × h

and b

∈ ℝ

h

∈ R h × h {\displaystyle U\in \mathbb {R}

^{h\times h}} b

∈ R h {\displaystyle b\in \mathbb {R} ^{h}} : Weight

matrices and bias vector parameters, which need to be learned

during training

The superscripts refer to the number of input features and number of hidden units,

respectively.

g

sigmoid function

c

hyperbolic tangent function

s

h

hyperbolic tangentfunction

LSTM for Anomaly Detection

In this section, you will look at LSTM implementations for some use cases using time

series data as examples. You have few different time series datasets to use to try to detect

anomalies using LSTM. All of them have a timestamp and a value that can easily be

plotted in Python.

Chapter 6 Long Short-term memory modeLS

224

Figure

6-13

shows the basic code to import all necessary packages. Also note the

versions of the various necessary packages.

Figure

6-14

shows the code to visualize the results via a chart for the anomalies and a

chart for the errors (the difference between predicted and truth) while training.

Figure 6-13. Code to import packages

Chapter 6 Long Short-term memory modeLS

225

You will use different examples of time series data to detect whether a point is

normal/expected or abnormal/anomaly. Figure

6-15

shows the data being loaded into a

Pandas dataframe. It shows a list of paths to datasets.

Figure 6-14. Code to visualize errors and anomalies

Figure 6-15. A list of paths to datasets

Chapter 6 Long Short-term memory modeLS

226

You will work with one of the datasets in more detail now. The dataset is nyc_taxi,

which basically consists of timestamps and demand for taxis. This dataset shows the

NYC taxi demand from 2014–07–01 to 2015–01–31 with an observation every half hour.

There are few detectable anomalies in this dataset: Thanksgiving, Christmas, New Year’s

Day, a snow storm, etc.

Figure

6-16

shows the code to select the dataset.

Figure 6-16. Code to select the dataset

You can load the data form the dataFilePath as a csv file using Pandas. Figure

6-17

shows the code to read the csv datafile into Pandas.

Figure 6-17. Code to read a csv datafile into Pandas

Figure

6-18

shows the plotting of the time series showing the months on the

x-axis and the value on the y-axis. It also shows the code to generate a graph showing

the time series.

Chapter 6 Long Short-term memory modeLS

227

Let’s understand the data more. You can run the describe() command to look at the

value column. Figure

6-19

shows the code to describe the value column.

Figure 6-18. Plotting the time series

Figure 6-19. Describing the value column

Chapter 6 Long Short-term memory modeLS

228

You can also plot the data using seaborn kde plot, as shown in Figure

6-20

.

The data points have a minimum of 8 and maximum of 39197, which is a wide range.

You can use scaling to normalize the data.

The formula for scaling is (x-Min) / (Max-Min). Figure

6-21

shows the code to scale

the data.

Figure 6-20. Using kde to plot the value column

Figure 6-21. Code to scale the data

Chapter 6 Long Short-term memory modeLS

229

Now that you scaled the data, you can plot the data again. You can plot the data using

seaborn kde plot, as shown in Figure

6-22

.

Figure 6-22. Using kde to plot the scaled_value column

You can take a look at the dataframe now that you have scaled the value column.

Figure

6-23

shows the dataframe showing the timestamp and value as well as scaled_

value and the datetime.

Figure 6-23. The modified dataframe

Chapter 6 Long Short-term memory modeLS

230

There are 10320 data points in the sequence and your goal is to find anomalies. This

means you are trying to find out when data points are abnormal. If you can predict a

data point at time T based on the historical data until T-1, then you have a way of looking

at an expected value compared to an actual value to see if you are within the expected

range of values for time T. If you predicted that ypred number of taxis are in demand on

January 1, 2015, then you can compare this ypred with the actual yactual. The difference

between ypred and yactual gives the error, and when you get the errors of all the points

in the sequence, you end up with a distribution of just errors.

To accomplish this, you will use a sequential model using Keras. The model consists

of a LSTM layer and a dense layer. The LSTM layer takes as input the time series data and

learns how to learn the values with respect to time. The next layer is the dense layer (fully

connected layer). The dense layer takes as input the output from the LSTM layer, and

transforms it into a fully connected manner. Then, you apply a sigmoid activation on the

dense layer so that the final output is between 0 and 1.

You also use the

Download 26,57 Mb.

Do'stlaringiz bilan baham:

1 ... 144 145 146 147 148 149 150 151 ... 283