The method TelSKART©
was presented for the irst
time during the conference
Regional Research on Tourist Ser-
vices Consumers
, organized by the Polish Tourist Organization
and the Ministry of Sport and Tourism (November 24-25 in War-
saw. Work published at: http://www.pot.gov.pl/dokumenty/do-
-pobrania/materia142y-szkoleniowe/Program%20konferencji%20
24_25_11_2009.pdf/
Bioinf
or
ma
tics
BIO-ALGORITHMS AND MED-SYSTEMS
JOURNAL EDITED BY JAGIELLONIAN UNIVERSITY – MEDICAL COLLEGE
Vol. 7, No. 13, 2011, pp. 67-70
A COmbINED SVm-RDA ClASSIFIER
FOR PROTEIN FOlD RECOGNITION
w
iesław
c
hMielnicki
1
, k
atarzyna
s
tąPor
2
1
Jagiellonian University, Faculty of Physics, Astronomy and Applied Computer Science, Kraków, Poland
2
Silesian University of Technology, Institute of Computer Science, Gliwice, Poland
Abstract:
Predicting the three-dimensional (3D) structure of a protein is a key problem in molecular biology. It is also an in
-
teresting issue for statistical methods recognition. There are many approaches to this problem considering discriminative and
generative classiiers. In this paper a classiier combining the well-known support vector machine (SvM) classiier with regular
-
ized discriminant analysis (RDA) classiier is presented. It is used on a real world data set. The obtained results are promising
improving previously published methods.
Keywords:
protein fold recognition, support vector machine, multi-class classiier, one-versus-one strategy
Introduction
Predicting the three-dimensional (3D) structure of a protein is
a key problem in molecular biology. Proteins manifest their func
-
tion through these structures, so it is very important to know
not only sequence of amino acids in a protein molecule, but
also how this sequence is folded. The successful completion of
many genome-sequencing projects has meant that the number
of proteins with known amino acids sequence is quickly increas-
ing, but the number of proteins with known 3D structure is still
relatively very small.
There is a variety of different aproaches to the protein struc-
ture prediction. They range from those based on physical prin
-
ciples, through methods that rely on evolutionary information, to
the statistical methods based on machine-learning systems. An
interesting survey of these methods can be found in Rychlewski
et al. [22]. In this paper we focused on machine-learning algo
-
rithms (Stąpor [20]).
There are several machine-learning methods to predict the
protein folds from amino acids sequences proposed in literature.
Ding and Dubchak [5] experiment with support vector machine
(SvM) and neural network (NN) classiiers. Shen and Chou
[9] proposed ensemble model based on nearest neighbour.
A modiied nearest neighbour algorithm called K-local hyperplane
(HKNN) was used by Okun [14]. Nanni [13] proposed ensemble
of classiiers: Fisher’s linear classiier and HKNN classiier.
There are two standard approaches to the classiication task:
generative classiiers use training data to estimate the probability
model for each class and then test items are classiied by com
-
paring their probabilities under these models. The discriminative
classiiers try to ind the optimal frontiers between classes based
on all the samples of the training data set.
This paper presents a classiier, which combines the sup
-
port vector machine (SvM) – discriminative classiier – with the
statistical regularized discriminant analysis (RDA) – generative
classiier. The SvM technique has been used in different ap
-
plication domains and has outperformed the traditional tech-
niques. However, the SvM is a binary classiier but the protein
fold recognition is a multi-class problem and how to effectively
extend a binary to the multi-class classiier case is still an on-
going research problem. There are many methods proposed to
deal with this issue
One of the irst and well-known methods is one-versus-one
strategy with max-win voting scheme. In this strategy all binary
classiiers vote for the preferred class. Originally a class with the
maximum number of votes is recognized as the correct class.
However, some of these binary classiiers are unreliable.
The votes from these classiiers inluence the inal classiica
-
tion result. In this paper there is a strategy presented to assign
a weight (which can be treated as a measure of reliability) to
each vote based on the values of the discriminant function from
an RDA classiier.
The rest of this paper is organized as follows: Section 2
introduces the database and the feature vectors used is these
experiments, Section 3 presents the basis of RDA classiier,
Section 4 shortly describes basics of the SvM classiier, Section
5 describes the method of combining the classiiers and Section
6 presents experimental results and conclusions.
Bioinf
or
ma
tics
68
A combined SvM-RDA
classiier for protein fold recognition…
The database and feature vectors
Using machine-learning methods entails the necessity to ind out
databases with representation of known protein sequences and
its folds. Then this information must be converted to the feature
space representation.
Do'stlaringiz bilan baham: |