Simple Autoencoders
Of course, we will focus on the anomaly detection piece in this chapter. Now, an
autoencoder neural network is actually a pair of two connected sub-networks, an
encoder and a decoder. An encoder network takes in an input and converts it into a
smaller, dense representation, also known as a latent representation of the input, which
the decoder network can then use to convert it back to the original input as much as
possible. Figure
4-2
shows an example of an autoencoder with encoder and decoder
sub-networks.
Figure 4-2. A depiction of an autoencoder
Autoencoders use data compression logic where the compression and
decompression functions implemented by the neural networks are lossy and are mostly
unsupervised without much intervention. Figure
4-3
shows an expanded view of an
autoencoder.
Chapter 4 autoenCoders
126
The entire network is usually trained as a whole. The loss function is usually either
the mean-squared error or cross-entropy between the output and the input, known as
the reconstruction loss, which penalizes the network for creating outputs different from
the input. Since the encoding (which is simply the output of the hidden layer in
the middle) has far less units than the input, the encoder must choose to discard
information. The encoder learns to preserve as much of the relevant information as
possible in the limited encoding and intelligently discards the irrelevant parts. The
decoder learns to take the encoding and properly reconstruct it back into the input. If
you are processing images, then the output is an image. If the input is an audio file, the
output is an audio file. If the input is some feature engineered dataset, the output will be
a dataset too. We will use a credit card transaction sample to illustrate autoencoders in
this chapter.
Figure 4-3. Expanded view of an autoencoder
Chapter 4 autoenCoders
127
Why do we even bother learning the presentation of the original input only to
reconstruct the output as well as possible? The answer is that when we have input with
many features, generating a compressed representation via the hidden layers of the
neural network could help in compressing the input of the training sample. So when the
neural network goes through all the training data and fine tunes the weights of all the
hidden layer nodes, what will happen is that the weights will truly represent the kind of
input that we typically see. As a result of this, if we try to input some other type of data,
such as having data with some noise, the autoencoder network will be able to detect the
noise and remove at least some portion of the noise when generating the output. This is
truly fantastic because now we can potentially remove noise from, for example, images
of cats and dogs. Another example is when security monitoring cameras capture hazy
unclear pictures, maybe in the dark or during adverse weather, causing noisy images.
The logic behind the denoising autoencoder that if we have trained our encoder
on good, normal images and the noise when it comes as part of the input is not really a
salient characteristic, it is possible to detect and remove such noise.
Figure
4-4
shows the basic code to import all necessary packages in a Jupyter
notebook. Note the versions of the various packages.
Chapter 4 autoenCoders
128
Figure
4-5
shows the code to visualize the results via a confusion matrix, a chart for
the anomalies and a chart for the errors (the difference between predicted and truth)
while training. It shows the Visualization helper class.
Figure 4-4. Importing packages in a Jupyter notebook
Chapter 4 autoenCoders
129
You will use the example of credit card data to detect whether a transaction is
normal/expected or abnormal/anomaly. Figure
4-6
shows the data being loaded into a
Pandas dataframe.
Figure 4-5. Visualization helpers
Chapter 4 autoenCoders
130
You will collect 20k normal and 400 abnormal records. You can pick different ratios
to try, but in general more normal data examples are better because you want to teach
your autoencoder what normal data looks like. Too much abnormal data in training
will train the autoencoder to learn that the anomalies are actually normal, which goes
against your goal. Figure
4-7
shows sampling the dataframe and choosing the majority of
normal data.
Do'stlaringiz bilan baham: |