Data Analytics (CS40003) Dr. Debasis Samanta Associate Professor



Download 1,76 Mb.
bet5/12
Sana23.07.2022
Hajmi1,76 Mb.
#844889
1   2   3   4   5   6   7   8   9   ...   12
Bog'liq
13ClusteringTechniques

Comments on k-Means algorithm

2. Choosing initial centroids:

    • A detail calculation reveals that there are possible combinations to examine the search of global optima.
    • For example, there are different ways to cluster 20 items into 4 clusters!
    • Thus, the strategy having its own limitation is practical only if
      • The sample is negatively small (~100-1000), and
      • k is relatively small compared to n (i.e.. .
  •  

Comments on k-Means algorithm

3. Distance Measurement:

  • To assign a point to the closest centroid, we need a proximity measure that should quantify the notion of “closest” for the objects under clustering.
  • Usually Euclidean distance (L2 norm) is the best measure when object points are defined in n-dimensional Euclidean space.
  • Other measure namely cosine similarity is more appropriate when objects are of document type.
  • Further, there may be other type of proximity measures that appropriate in the context of applications.
  • For example, Manhattan distance (L1 norm), Jaccard measure, etc.

Comments on k-Means algorithm

3. Distance Measurement:

Thus, in the context of different measures, the sum-of-squared error (i.e., objective function/convergence criteria) of a clustering can be stated as under.

Data in Euclidean space (L2 norm):

Data in Euclidean space (L1 norm):

The Manhattan distance (L1 norm) is used as a proximity measure, where the objective is to minimize the sum-of-absolute error denoted as SAE and defined as

  •  

Comments on k-Means algorithm

Distance with document objects

Suppose a set of n document objects is defined as d document term matrix (DTM) (a typical look is shown in the below form).



Download 1,76 Mb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7   8   9   ...   12




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish