Python Programming for Biology: Bioinformatics and Beyond


Support vector machine predictions



Download 7,75 Mb.
Pdf ko'rish
bet408/514
Sana30.12.2021
Hajmi7,75 Mb.
#91066
1   ...   404   405   406   407   408   409   410   411   ...   514
Bog'liq
[Tim J. Stevens, Wayne Boucher] Python Programming

Support vector machine predictions

To  use  the  SVM  to  make  a  prediction  involves  working  out  which  side  of  the  decision

hyperplane,  determined  during  training,  a  query  feature  vector  lies.  Naturally  the

prediction  function  takes  a  query  vector  as  input,  together  with  the  training  data  and  its

known categories. We also pass in the function and parameters that allow the calculation

of the coincidence of feature vectors using a kernel.

def svmPredict(query, data, knowns, supports, kernelFunc, kernelParams):

prediction = 0.0

for j, vector in enumerate(data):

support = supports[j]

if support > 0:

coincidence = kernelFunc(vector, query, *kernelParams) + 1.0

prediction += coincidence * support * knowns[j]

return prediction

The  SVM  prediction  is  made  by  going  through  all  of  the  training  data  points  and

finding  those  that  are  support  vectors  (support  >  0).  When  a  support  vector  is  found  its

coincidence with (similarity to) the query is found using the kernel function. The degree of

coincidence is multiplied by the amount of support for that training vector and the known

classification.  Given  that  the  known  classification  of  the  data  vector  is  +1.0  or  −1.0  this

will either add or subtract from the prediction total; effectively each support vector pulls

the  summation  to  the  positive  or  the  negative  size.  In  the  end  whether  the  predictSum

value is finally positive or negative determines the category of the query.

This next function, svmSeparation(), is used to test whether the training data was well

separated into two categories, i.e. reproducing the known classification. We don’t use the

above prediction function because we can reuse the pre-calculated kernelArray for speed.

As before, the known classification is in the form of an array containing values of +1.0 or

−1.0.

def svmSeparation(knowns, supports, kernelArray):



score = 0.0

nz = [i for i, val in enumerate(supports) if val > 0]

for i, known in enumerate(knowns):

prediction = sum(supports[nz] * knowns[nz] * kernelArray[nz, i] )

if known * prediction > 0.0: # same sign

score += 1.0

return 100.0 * score / len(knowns)

Making  the  prediction  is  done  using  the  same  logic  as  described  for  svmPredict(),

although here we do it in one line using NumPy array operations, given that we don’t have

to call the kernel function and can use the pre-calculated array instead. It is also notable

that we calculate nz, a list of the indices for the non-zero support values, upfront to help



reduce the number of calculations. With each prediction value, to actually test whether the

classification  is  correct  we  see  if  the  prediction  is  the  same  sign  as  the  known

classification. At the end the function gives back a percentage of correct classifications for

the training data.

To test out the support vector machine code we will make a fairly simple example that

contains  a  discontinuous  patchwork  of  points  in  a  two-dimensional  plane  that  have  been

placed  into  one  of  two  categories,  each  in  distinct  regions.  The  following  code  goes

through a grid of x  and  y  positions,  which  are  normalised  to  be  between  0.0  and  1.0,  to

make an alternating chequerboard pattern for the categorisation (−1 or +1), except for the

middle square, which is flipped the other way, resulting in a central cross. This will give a

recognisable shape in the data that we can look for afterwards.

At  each  grid  location  the  random.normal  function  from  NumPy  is  used  to  make  a

cluster of points by specifying a set of values for the x and y axes. The category and the x

and y value for each point are placed in the main catData list. This list is then shuffled to

introduce a random order. The list of known categorisations is extracted as the last index

(-1) for all catData items and the training feature vectors as everything up to the last index

([:,:-1]).

numPoints = 20

catData = []

for x in range(1,6):

for y in range(1,6):

xNorm = x/6.0 # Normalise range [0,1]

yNorm = y/6.0

if (x == 3) and (y == 3):

category = -1.0

elif (x%2) == (y%2):

category = 1.0

else:


category = -1.0

xvals = random.normal(xNorm, 0.2, numPoints)

yvals = random.normal(yNorm, 0.2, numPoints)

for i in range(numPoints): # xrange in Python 2

catData.append( (xvals[i], yvals[i], category) )

catData = array(catData)

random.shuffle(catData)

knowns = catData [:,-1]

data = catData [:,:-1]

Running  the  SVM  on  this  data  involves  passing  in  the  known  classifications,  training

data,  a  Gaussian  kernel  function  and  the  parameters  for  the  kernel.  After  training  the

svmSeparation()function  can  be  used  to  assess  how  well  the  SVM  separates  the  known

categories.



params = (0.1,)

supports, steps, kernelArray = svmTrain(knowns, data, kernelGauss, params)

score = svmSeparation(knowns, supports, kernelArray)

print('Known data: %5.2f%% correct' % ( score ))




Download 7,75 Mb.

Do'stlaringiz bilan baham:
1   ...   404   405   406   407   408   409   410   411   ...   514




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish