even if they are
not encodings of actual images.
In many real-world use cases, we have a whole bunch of data that we’re looking
at it (it could be images, it could be audio or text; well, it could be anything) but the
underlying data that needs to be processed might be lower in dimensions than the
actual data, so lot of the machine learning models involve some sort of dimensionality
reduction. One very popular technique is singular value decomposition or principal
component analysis. Similarly, in the deep learning space, variational autoencoders do
the task of reducing the dimensions.
Before we dive into the mechanics of variational autoencoders, let’s just recap
the normal autoencoders that you saw in this chapter. Autoencoders basically use an
encoder and decoder layer at a minimum to reduce the input data features into a latent
representation by the encoder layer. The decoder expands the latent representation
to generate the output with the goal of training the model well enough to reproduce
the input as the output. Any discrepancy between the input and output could signify
some sort of abnormal behavior or deviation from what is normal, otherwise known as
anomaly detection. In a way, the output gets compressed into a smaller representation
but has less dimension than the input, and this is what we call the bottleneck. From the
bottleneck, we try to reconstruct the input.
Now that you have the basic concept of the normal autoencoders, let’s look at the
variational autoencoders. In variational autoencoders, instead of mapping the input to a
fixed vector, we map the input to a distribution so the big difference is that the bottleneck
vector seen in the normal order in quarters is replaced with the mean vector and a
standard deviation vector by looking at the distributions and then taking the sampled
latent vector as the actual bottleneck. Clearly this is very different from the normal
autoencoder where the input directly yields a latent vector.
Chapter 4 autoenCoders
164
First, an encoder network turns the input sample x into two parameters in a latent
space, which you can call z_mean and z_log_sigma. Then, you randomly sample similar
points z from the latent normal distribution that is assumed to generate the data,
via z = z_mean + exp(z_log_sigma) * epsilon, where epsilon is a random normal
tensor. Finally, a decoder network maps these latent space points back to the original
input data. Figure
4-58
depicts the variational encoder neural network.
Do'stlaringiz bilan baham: |