Editorial board editor-in-chief



Download 3,74 Mb.
Pdf ko'rish
bet60/74
Sana20.01.2023
Hajmi3,74 Mb.
#900629
1   ...   56   57   58   59   60   61   62   63   ...   74
Bog'liq
E learning in pharmaceutical continuing

λ
=0.8
gave the best results.
The RDA classiier uses a table of values of the discriminant 
functions 
d
k
(x)
and assigns a test vector to a class with minimal 
value of this function. Using this algorithm the recognition ratio 
55,6% was achieved.
It is easy to notice that the bigger value of the discriminant 
function for the class the smaller probability that the vector be-
longs to this class. Let us consider an SvM binary classiier that 
deals with i-th and j-th class. Assume that this classiier assigns 
the vector 
x
to the class 
i
. Then the difference with 
d
i
(x)
and 
d
min
(x) 
= min{
d
k
(x)
}, 
k=1,2,…, n
can be treated as a measure 
of reliability of such a classiier. Formally this measure will be 
the weight of the vote and can be deined as: 
1– 
(d
i
(x) 
– 
d
min
(x)) / d
min
(x)
(8)
Using this procedure to each binary classiier there is a weight 
assigned. These weights are applied to the voting table, so the 
votes are not equally treated. Now, the protein with the maximum 
number of votes (which is now a real number) is classiied as the 
correct class. The recognition rate using this method is 61,8%. 
Results and conclusions
In this paper there is presented a combined generative/dis-
criminative classiier. This classiier uses the information provided 
resolve the ill-posed estimation is to regularize a covariance 
matrix. First, the covariance matrix 
Σ
k
can be replaced by their 
average (pooled covariance matrix), i.e. 
Σ
ˆ
 


Σ
ˆ
k
 / 

N
k
which 
leads to linear discriminant analysis (LDA). This assumes that 
all covariance matrices are similar. It is very limited approach, 
so in regularized discriminant analysis (RDA) each covariance 
matrix can be estimated as: 
Σ
ˆ
k
 (λ)= (
1 –
 λ)Σ
ˆ
k
 + λ Σ
ˆ
(2)
where 
0 ≤ λ ≤ 1
. The parameter 
λ 
controls the degree of shrink-
age of the individual class covariance matrix estimate toward 
the pooled estimate. 
There is no universal value of 
λ 
parameter for all classiica
-
tion problems. This value must be experimentally chosen using 
cross-validation procedure on the training data set.
There can be further regularization using another parameter 
as follows:
Σ
ˆ
k
 
(λλ, γ)=(
1 –
 γ)Σ
ˆ
k
 
(λλ)+γ tr[Σ
ˆ
k
 
(λλ) Ι / N 
(3)
where 
0 ≤ γ ≤ 1
,
 tr[ Σ
ˆ
k
 (λλ) 
is sum of eigenvalues of 
Σ
ˆ
k
 
(λλ

The parameter 
γ
controls the degree of the shrinkage towards 
multiple of identity matrix for a given value of 
λ
.
The SVM classiier
The support vector machine (SVM) is a well known large margin 
classiier proposed by vapnik [10]. This technique has been 
used in different application domains and has outperformed 
the traditional techniques. The basic concept behind the SvM 
classiier is to search an optimal separating hyperplane, which 
separates two classes. However the perfect separation is not 
often feasible, so slack variables 
ξ
1

2
,…,ξ
n
can be used to 
measure the amount of the violation of the original constraints. 
Let us consider a classiier whose decision function is given by:
f(x)= sign(x
T
 w+b)
(4)
where 
x
denotes a feature vector and 
w
is a weight vector. Then 
the SVM algorithm minimizes the objective function:
½||
w
||²
² 
+
c

n
 
 
i
=1 
ξ
i
 
(5)
subject to: 
y
i
 (wx
i
 +b) 
≥ 1−
 ξ
i
, ξ
i
 > 
0, 
i = 
1,…,
 n
This problem leads to so called dual optimization problem 
and inally (considering non-linear decision hyperplane and using 
the kernel trick) to: 
f(x)= sign(

N
 
 
i
=1
 α
i
y
i
K(x
i
,x)+b)
(6)
where 
 0 ≤ α
i
 ≤ C, i =
1,2,…, 

are nonnegative Lagrange 
multipliers, 
x
i
are the support vectors, 
x
i
is a cost parameter that 
controls the trade off between allowing training errors and forcing 
rigid margins and 
K(x
i
,x)
is the kernel function.
The proposed combined classiier
The SvM is a binary classiier but the protein fold recognition 
is a multi-class problem. There are many methods proposed 
to deal with this issue. In our classiier we use the irst and 
well-known method: one-versus-one strategy with max-win 
voting scheme. 
The well-known LIBSvM library version 2.89 was used in our 
research (Chang and Lin, [3]). Although the implementation of this 
library includes one-versus-one strategy for the multi category prob-
lems, only the binary version of the classiier was used. LIBSvM 


Bioinf
or
ma
tics
70
A combined SvM-RDA 
classiier for protein fold recognition…
by the generative classiier, to improve the recognition rate of 
the discriminative classiier. The results using this method are 
presented in table 1. It can be seen that the combined classiier 
achieves better results than the SvM or RDA classiiers alone.
The accuracy measure used in this paper is the standard 
 
Q
percentage accuracy (Baldi et al. [1]). Suppose there is 
N=n
1
+n

 +…+ n
p
test proteins, where 
n

is the number of 
proteins which belong to the class 
i
. Suppose that 
c
i
of proteins 
from 
n
i
are correctly recognised (as belonging to the class 
i
). 
So, the total number of 
C=c
1
+c

 +…+ c
p
proteins is correctly 
recognized. Therefore the total accuracy is 
Q= C/N
.

Download 3,74 Mb.

Do'stlaringiz bilan baham:
1   ...   56   57   58   59   60   61   62   63   ...   74




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish