Neural Networks
Chapter 4
[ 102 ]
To prevent overfitting, we add a dropout. A 50% dropout means that every time it goes to
update the weights, it just refuses to update half of them, a random half. We then find the
weighted sum of their inputs.
We take that sum and run the softmax. Softmax takes these
different outputs and turns
them into probabilities so that one of them is highest and they're all between 0 and 1. Then,
we compile the model to compute the loss as
DBUFHPSJDBM@DSPTTFOUSPQZ
. This is
usually something one uses when they use one-hot encoding. Let's use the Adamax
optimizer. There are different optimizers that are available in Keras, and you can look at the
Keras documentation at
IUUQTLFSBTJP
.
Accuracy is an essential measure to work on while we train the network, and we also want
to compute accuracy at the very end to see how well it's done.
We then run fit on the training set.
E@USBJO@JOQVUT
is the train inputs,
and
E@USBJO@JOQVUT
is the matrix bag of words model,
train outputs, and the one -hot
encoding. We are going to say that we want 10 epochs, which means it'll go through the
entire training set ten times, and a batch size of 16, which means it will go through 16 rows
and compute the average loss and then update the weight.
After it's been fit, which indirectly means it's been trained, we evaluate the test. It's not until
this point that it actually looks at the test. The scores that come out are going to be the loss
and whatever other metrics we have, which in this case is accuracy. Therefore, we'll just
show the accuracy times 100 to get a percent and we'll return the scores.
Now, let's build that split again, which is the k-fold split with five different folds:
Neural Networks
Chapter 4
[ 103 ]
We collect the scores.
For each split, we're going to run our
USBJO@BOE@UFTU
function and
save the scores. Here, it is running on each split. If you scroll, you will see that you get the
epochs going. We can see that the accuracy on the training input increases per epoch. Now,
if this gets really high, you might start worrying about over-fitting, but after the 10 epochs,
use the testing set which it's never seen before. This helps us obtain the accuracy number
for the testing set. Then, we'll do it all again for the next split and we'll get a different
accuracy. We'll do this a few more times until we have five different numbers, one for each
split.
The average is found as follows: :
Here, we get 95%, which is very close to what we got by using random forest. We didn't use
this neural network example to show that we can get 100%. We used this method to
demonstrate an alternative way to detect spam instead of the random forest method.
Summary
In
this chapter, we covered a brief introduction to neural networks, proceeded with feed-
forward neural networks, and looked at a program to identify the genre of a song with
neural networks. Finally, we revised our spam detector from earlier to make it work with
neural networks.
In the next chapter, we'll look at deep learning and learn about convolutional neural
networks.
5
5
Deep Learning
In this chapter, we'll cover some of the basics of deep learning. Deep learning refers to
neural networks with lots of layers. It's
kind of a buzzword, but the technology behind it is
real and quite sophisticated.
The term has been rising in popularity along with machine learning and artificial
intelligence, as shown in this Google trend chart:
As stated by some of the inventors of deep learning methods, the primary advantage of
deep learning is that adding more data and more computing power often produces more
accurate results, without the significant effort required for engineering.
In this chapter, we are going to be looking at the following:
Deep learning methods
Identifying handwritten mathematical symbols with CNNs
Revisiting the bird species identifier to use images