gradient descent. Gradient descent is an optimization algorithm that
finds the gradient of the cost function and takes a single step in the direction of the local
minimum to generate values to use to adjust the weights and biases.
How much of a step you take is controlled by the
learning rate. The bigger the
learning rate, the larger the step you take at each iteration, and the quicker the local
minimum is approached. The smaller the learning rate, the longer the training takes
since the steps are smaller. However, a problem with too large of a learning rate is that it
could overshoot the local minimum entirely, leading to a complete failure to ever reach
the local minimum. Too small of a learning rate and the local minimum might take way
too long to reach. When the model starts to reach an ideal level of performance, the
gradients should be approaching 0 since the weights would have the cost function reach
a local minimum, signifying that the differences between the model’s predictions and
the actual predictions are very small.
In a process called
backpropagation, the gradients are calculated and the weights
are adjusted for each node in a layer, before the same process is done for the layer
before that until all of the layers have had their weights adjusted. The entire process of
passing the data through the model and backpropagating to readjust the weights is what
comprises the training process of a model in deep learning.
While the entire training process may sound complicated and computationally
heavy, GPUs help train the models much quicker because they are optimized to perform
the matrix calculations required by graphics processing.
Now that you know more about what deep learning is and how artificial neural
networks operate, a question might arise on
why we should use deep learning for
anomaly detection.
First of all, thanks to the advancements in GPU technology, we can train deep
learning models that are far deeper (many layers with lots of parameters) and on huge
data sets. This in itself leads to incredible performances by the networks and allows the
model to have much more powerful applications.
Not only has this led to a diverse set of models that are each suited for different
applications (image classification, video captioning, object detection, language
translation, generative models that can summarize articles, etc.), but the models keep
getting better and better at their respective tasks.
Chapter 3 IntroduCtIon to deep LearnIng
84
The models are also far more scalable than their traditional counterparts, since
deep learning models don’t hit a plateau in training accuracy as the number of data
entries increases, meaning we can apply deep learning models to massive volumes of
data. This attribute of deep learning models pairs very well with the trend of big data in
today’s society.
In this chapter, you will look at applying deep learning models to classifying
handwritten digits as an introduction to using two great, popular deep learning
frameworks in Python:
Do'stlaringiz bilan baham: |