Identifying handwritten mathematical
symbols with CNNs
This sections deals with building a CNN to identify handwritten mathematical symbols.
We're going to use the
)"4:W
dataset. This contains 168,000 images from 369 different
classes where each represents a different symbol. This dataset is a more complex analog
compared to the popular MNIST dataset, which contains handwritten numbers.
Deep Learning
Chapter 5
[ 114 ]
The following diagram depicts the kind of images that are available in this dataset:
Deep Learning
Chapter 5
[ 115 ]
And here, we can see a graph showing how many symbols have different numbers of
images:
It is observed that many symbols have few images and there are a few that have lots of
images. The code to import any image is as follows:
Deep Learning
Chapter 5
[ 116 ]
We begin by importing the
*NBHF
class from the
*1ZUIPO
library. This allows us to show
images inside Jupyter Notebook. Here's one image from the dataset:
This is an image of the alphabet
A
. Each image is 30 x 30 pixels. This image is in the RGB
format even though it doesn't really need to be RGB. The different channels are
predominately black and white or grayscale. We're going to use these three channels. We
then proceed to import CSV, which allows us to load the dataset:
This CSV file states all the different filenames and the class names. We import the image
class from
QJM
, which allows us to load the image. We import
QSFQSPDFTTJOHJNBHF
, which then allows us to convert the images into
OVNQZ
arrays.
Let's us then go through the data file, taking a closer look at every filename and loading it,
while recording which class it belongs to:
Deep Learning
Chapter 5
[ 117 ]
The immediate next step would be to save the images and the classes and use the CSV
reader. We need to set a counter to make sure we skip the first row, which is the header of
the CSV file. Only after this, we proceed to open the image, which is in the first column of
each row. This is converted into an array. The achieved result will have dimensions of 30 x
30 x 3, which is interpreted as 30 width, 30 height, and 3 channels (RGB).
These three channels will have numbers between 0 and 255. These are typical pixel values,
which are not good for a neural network. We need values that lie between 0 and 1 or -1 and
1. To do this, we divide each pixel value by 255. To make things easier, we're going to
collect the filename, the class name, and the image matrix and put them into our images
list. We will also make a note of the name of the class. The following snippet will make us
understand the concept to a greater depth:
Deep Learning
Chapter 5
[ 118 ]
The file is named
IBTZEBUBWQOH
.
"
is the name of the class followed by the
array. The array has dimensions 30 x 30 x 3. The innermost and last dimension, is 3. Each
1.0
depicts the color white. We understand this because we divided everything by 255 as
mentioned earlier.
We have 168,000 images in the
)"4:W
dataset:
We then proceed to shuffle and then split the data on an 80% train, 20% test basis. As seen
in the following codeblock, we first shuffle, then proceed to split the image:
Because we use these tuples with three different values, we're going to need to ultimately
collect all that into a matrix:
We need to collect the images as well as the labels. To collect the images, we go through
each row and take each third element. This element is the image matrix. We stick it all
together into a
OVNQZ
array. The same is done for the train and test datasets.
For the outputs, we need to go and pick out the second value. These are still strings, such
as
B
and
. We need to convert the second value into one-hot encoding before it can be used
for a neural network.
Do'stlaringiz bilan baham: |