Hands-On Machine Learning with Scikit-Learn and TensorFlow


| Chapter 9: Unsupervised Learning Techniques



Download 26,57 Mb.
Pdf ko'rish
bet203/225
Sana16.03.2022
Hajmi26,57 Mb.
#497859
1   ...   199   200   201   202   203   204   205   206   ...   225
Bog'liq
Hands on Machine Learning with Scikit Learn Keras and TensorFlow

254 | Chapter 9: Unsupervised Learning Techniques


Okay, that’s our baseline: 96.7% accuracy. Let’s see if we can do better by using K-
Means as a preprocessing step. We will create a pipeline that will first cluster the
training set into 50 clusters and replace the images with their distances to these 50
clusters, then apply a logistic regression model.
Although it is tempting to define the number of clusters to 10,
since there are 10 different digits, it is unlikely to perform well,
because there are several different ways to write each digit.
from
sklearn.pipeline
import
Pipeline
pipeline
=
Pipeline
([
(
"kmeans"

KMeans
(
n_clusters
=
50
)),
(
"log_reg"

LogisticRegression
()),
])
pipeline
.
fit
(
X_train

y_train
)
Now let’s evaluate this classification pipeline:
>>> 
pipeline
.
score
(
X_test

y_test
)
0.9822222222222222
How about that? We almost divided the error rate by a factor of 2!
But we chose the number of clusters 
k
completely arbitrarily, we can surely do better.
Since K-Means is just a preprocessing step in a classification pipeline, finding a good
value for 
k
is much simpler than earlier: there’s no need to perform silhouette analysis
or minimize the inertia, the best value of 
k
is simply the one that results in the best
classification performance during cross-validation. Let’s use 
GridSearchCV
to find the
optimal number of clusters:
from
sklearn.model_selection
import
GridSearchCV
param_grid
=
dict
(
kmeans__n_clusters
=
range
(
2

100
))
grid_clf
=
GridSearchCV
(
pipeline

param_grid

cv
=
3

verbose
=
2
)
grid_clf
.
fit
(
X_train

y_train
)
Let’s look at best value for 
k
, and the performance of the resulting pipeline:

Download 26,57 Mb.

Do'stlaringiz bilan baham:
1   ...   199   200   201   202   203   204   205   206   ...   225




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish