Python Programming for Biology: Bioinformatics and Beyond


Figure 23.1.  How many seashells in how many groups?



Download 7,75 Mb.
Pdf ko'rish
bet370/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   366   367   368   369   370   371   372   373   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 23.1.  How many seashells in how many groups? Some views of data are better

at distinguishing items and clusters than others.



Clustering

Clustering  relates  to  the  process  of  partitioning  data  units  into  discrete  groups.  Such  an

operation  requires  that  the  similarity  (or  difference)  between  units  is  measured  and  then

the  members  of  each  group  are  allocated  to  give  the  arrangement  that  maximises  the

association of similar items and the separation of dissimilar ones. In practice most of the

clustering methods presented here will not be able to give an immediate analytical solution

to  this  optimisation  problem,  rather  the  process  will  be  an  iterative  one,  with  several

cycles  of  improvement  until  a  stable  solution  is  found.  As  mentioned  above,  clustering

may  operate  on  data  items  which  have  a  high  dimensionality,  represented  as  feature

vectors. However, if the analysis is too slow or too complicated the original data may be

transformed (projected) into a set of lower-dimensionality data by methods like PCA prior

to the clustering operation.

Depending on the situation, the process of clustering may work with prior knowledge

about  the  number  of  clusters,  e.g.  what  the  underlying  data  categories  are.  Alternatively,

the  number  of  clusters  may  be  completely  unknown.  If  the  numbers  of  clusters  is  not

known then this number must be deduced or optimised. Generally, several different trials

are run, each of which involves a different number of clusters. Within each trial there is a

separate optimisation for how the data items are allocated within that number of clusters.

The best number of clusters is then determined from the best overall arrangement from all

the  trials.  It  would  be  possible  to  place  each  data  item  in  a  separate  cluster,  thus  giving

maximum separation, but the objective is to give the best balance between the number of

clusters and the degree of separation, rather than only maximising separation.

Once  clusters  are  defined  the  result  may  then  be  used  as  a  means  of  predicting

classification,  i.e.  estimating  in  which  cluster  a  previously  unseen  piece  of  data  lies.

Making  a  prediction  may  be  as  simple  as  finding  which  cluster  is  closest.  Alternatively,

more advanced approaches, such as the supervised machine learning methods described in

Chapter 24

, can be used where classification is not so easy. These can learn patterns from

training  data  with  known,  fixed  classifications  before  predictions  are  made.  One  of  the



machine learning methods presented later, the self-organising map, is notable because it is

unsupervised (needs no prior classifications) and thus can be viewed as an alternative to

the linear clustering methods presented in this chapter.




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   366   367   368   369   370   371   372   373   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish