Python Artificial Intelligence Projects for Beginners



Download 16,12 Mb.
Pdf ko'rish
bet42/65
Sana02.01.2022
Hajmi16,12 Mb.
#311589
1   ...   38   39   40   41   42   43   44   45   ...   65
Bog'liq
Python Artificial Intelligence Projects for Beginners - Get up and running with 8 smart and exciting AI applications by Joshua Eckroth (z-lib.org)

and

or
, and 
the
. With TF-IDF, you can turn off the 
JEG
 component and just
keep the 
UG
 component, which would just be a log of the count. You can use 
JEG
 as well.
With random forests, you've got a choice of how many trees you use, which is the number
of estimators.
There's another feature of scikit-learn available that allows us to search all of these
parameters. For that, it finds out what the best parameters are:


Applications for Comment Classification
Chapter 3
[ 62 ]
We can make a little dictionary where we say the name of the pipeline step and then
mention what the parameter name would be and this gives us our options. For
demonstration, we are going to try maximum number of words or maybe just a maximum
of 1,000 or 2,000 words. 
Using 
OHSBNT
, we can mention just single words or pairs of words that are stop words, use
the English dictionary of stop words, or don't use stop words, which means in the first case
we need to get rid of common words, and in the second case we do not get rid of common
words. Using TF-IDF, we use 
JEG
 to state whether it's yes or no. The random forest we
created uses 20, 50, or 100 trees. Using this, we can perform a grid search, which runs
through all of the combinations of parameters and finds out what the best combination is.
So, let's give our pipeline number 2, which has the TF-IDF along with it. We will use 
GJU
 to
perform the search and the outcome can be seen in the following screenshot:


Applications for Comment Classification
Chapter 3
[ 63 ]
Since there is a large number of words, it takes a little while, around 40 seconds, and
ultimately finds the best parameters. We can get the best parameters out of the grid search
and print them to see what the score is:
So, we got nearly 96% accuracy. We used around 1,000 words, only single words, used yes
to get rid of stop words, had 100 trees in the random forest, and used yes and the IDF and
the TF-IDF computation. Here we've demonstrated not only bag of words, TF-IDF, and
random forest, but also the pipeline feature and the parameter search feature known as grid
search.
Word2Vec models
In this section, we'll learn about Word2Vec, a modern and popular technique for working
with text. Usually, Word2Vec performs better than simple bag of words models. A bag of
words model only counts how many times each word appears in each document. Given
two such bag of words vectors, we can compare documents to see how similar they are.
This is the same as comparing the words used in the documents. In other words, if the two
documents have many similar words that appear a similar number of times, they will be
considered similar.
But bag of words models have no information about how similar the words are. So, if two
documents do not use exactly the same words but do use synonyms, such as 

Download 16,12 Mb.

Do'stlaringiz bilan baham:
1   ...   38   39   40   41   42   43   44   45   ...   65




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish