Python Artificial Intelligence Projects for Beginners



Download 16,12 Mb.
Pdf ko'rish
bet41/65
Sana02.01.2022
Hajmi16,12 Mb.
#311589
1   ...   37   38   39   40   41   42   43   44   ...   65
Bog'liq
Python Artificial Intelligence Projects for Beginners - Get up and running with 8 smart and exciting AI applications by Joshua Eckroth (z-lib.org)

Pipeline
. Pipeline is
really convenient and will bring together two or more steps so that all the steps are treated
as one. So, we will build a pipeline with the bag of words, and then use 
DPVOU7FDUPSJ[FS
followed by the random forest classifier. Then we will print the pipeline, and it the steps
required:


Applications for Comment Classification
Chapter 3
[ 59 ]
We can let the pipeline name of each step by itself by adding 
$PVOU7FDUPSJ[FS
 in
our 
3BOEPN'PSFTU$MBTTJGJFS
 and it will name
them 
$PVOU7FDUPSJ[FS
 and 
3BOEPN'PSFTUDMBTTJGJFS
:
Once the pipeline is created you can just call it fit and it will perform the rest that is first it
perform the fit and then transform with the 
$PVOU7FDUPSJ[FS
, followed by a fit with the
3BOEPN'PSFTU
 classifier. That's the benefit of having a pipeline:
Now you call score so that it knows that when we are scoring it will to run it through the
bag of words 
DPVOU7FDUPSJ[FS
, followed by predicting with the
3BOEPN'PSFTU$MBTTJGJFS
:
This whole procedure will produce a score of about 94. We can only predict a single
example with the pipeline. For example, imagine we have a new comment after the dataset
has been trained, and we want to know whether the user has just typed this comment or
whether it's spam:


Applications for Comment Classification
Chapter 3
[ 60 ]
As seen, it's detected correctly; but what about the following comment:
To overcome this and deploy this classifier into an environment and predict whether it is a
TQN
 or not when someone types a new comment. We will use our pipeline to figure out
how accurate our cross-validation was. We find in this case that the average accuracy was
about 94:
It's pretty good. Now let's add TF-IDF to our model to make it more precise:
This will be placed after 
DPVOU7FDUPSJ[FS
. After we have produced the counts, we can
then produce a TF-IDF score for these counts. Now we will add this in the pipeline and
perform another cross-validation check with the same accuracy:


Applications for Comment Classification
Chapter 3
[ 61 ]
This show the steps required for the pipeline:
The following output got us 
$PVOU7FDUPSJ[FS
, a TF-IDF transformer,
and 
3BOEPN'PSFTU$MBTTJGJFS
. Notice that 
DPVOUWFDUPSJ[FS
 can be lower case or
upper case in the dataset; it is on us to decide how many words you want to have. We can
either use single words or bigrams, which would be pairs of words, or trigrams, which can
be triples of words. We can also remove stop words, which are really common English
words such as 

Download 16,12 Mb.

Do'stlaringiz bilan baham:
1   ...   37   38   39   40   41   42   43   44   ...   65




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish