Python Programming for Biology: Bioinformatics and Beyond


Figure 24.2.  An overview of the k-nearest neighbour method



Download 7,75 Mb.
Pdf ko'rish
bet395/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   391   392   393   394   395   396   397   398   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Figure 24.2.  An overview of the k-nearest neighbour method. Data items are

represented as vector locations within a space where the axes correspond to different

features of the data. The training data vectors have known classifications and the

classification of a query point is predicted from its k (in this case five) nearest neighbour

points. A poll is taken of the categories present among the neighbour points and the most

common category is the prediction for the query point.



Distance between feature vectors

Firstly, before introducing the main kNearestNeighbour function we will consider a small

function that will be used to measure the distance between two points of data. Effectively

this is to say how similar two pieces of input data are; the smaller the distance between the

feature measurements that make up two data points the more similar they are. The distance

measurement  demonstrated  here  is  simply  the  summation  of  the  squared  differences

between  each  corresponding  pair  of  values.  For  example,  if  we  were  measuring  the

‘distance’ between two colours this calculation means finding the squares of the difference

in  the  red,  green  and  blue  values.  Note  that  we  need  not  take  the  square  root  of  the

summation,  as  you  might  do  in  the  calculation  of  the  conventional  distance,  because  the

square root operation takes more calculation time and is not really needed; we only need

to  find  the  closest  points  of  data  and  the  smallest  distance  will  also  have  the  smallest

squared  value.  If  the  following  function  is  not  such  a  good  measure  of

similarity/difference

4

in data points then you can easily replace it with one that is, without



affecting the main function.

The ‘distance’ function simply takes two feature vectors as inputs, which represent two

data points. We use the inbuilt zip() function to extract equivalent feature values from both

inputs at the same time, e.g. a and b would be the two red, green or blue values in turn, if

the inputs were colours. Then we calculate the difference for each pair of values, square

the difference and add it to the total. The total is then given back at the end at the return

statement so that it can be picked up by the calling function.



def getFeatureDistance(vector1, vector2):

distance = 0.0

for a, b in zip(vector1, vector2):

delta = a-b

distance += delta * delta

return distance




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   391   392   393   394   395   396   397   398   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish