please
and
plz
, they're not regarded as similar for the bag of words model. Word2Vec can figure out
that some words are similar to each other and we can exploit that fact to get better
performance when doing machine learning with text.
Applications for Comment Classification
Chapter 3
[ 64 ]
In Word2Vec, each word itself is a vector, with perhaps 300 dimensions. For example, in a
pre-trained Google Word2Vec model that examined millions or billions of pages of text, we
can see that cat, dog, and spatula are 300-dimensional vectors:
Cat = <0.012, 0.204, ..., -0.275, 0.056> (300 dimensions)
Dog = <0.051, -0.022, ..., -0.355, 0.227>
Spatula = <-0.191, -0.043, ..., -0.348, 0.398>
Similarity (distance) between cat and dog
c
0.761
Similarity between cat and spatula
c
0.124
If we compare the similarity of the dog and cat vectors, we will get 0.761 or 76% of
similarity. If we do the same with cat and spatula, we get 0.124. It's clear that Word2Vec
learned that dog and cat are similar words but cat and spatula are not. Word2Vec uses
neural networks to learn these word vectors. At a high level, a neural network is similar to
random forest or a decision tree and other machine learning techniques because they're
given a bunch of inputs and a bunch of outputs, and they learn how to predict the outputs
from the inputs.
For Word2Vec, the input is a single word, the word whose vector we want to learn, and the
output is its nearby words from the text. Word2Vec also supports the reverse of this input-
output configuration. Thus, Word2Vec learns the word vectors by remembering its context
words. So, dog and cat will have similar word vectors because these two words are used in
similar ways, like
she pet the dog
and
she pet the cat
. Neural networking with Word2Vec can
take one of two forms because Word2Vec supports two different techniques for training.
The first technique is known as continuous bag of words, where the context words are the
input, leaving out the middle word and the word whose vector we're learning, the middle
word, is the output. In the following diagram, you can see three words before and after the
word
channel
:
Applications for Comment Classification
Chapter 3
[ 65 ]
Those are the context words. The continuous bag of words model slides over the whole
sentence with every word acting as a center word in turn. The neural network learns the
300-dimensional vectors for each word so that the vector can predict the center word given
the context words. In other words, it can predict the output given its inputs.
In the second technique, we're going to flip this. This is known as
Do'stlaringiz bilan baham: |