O n f e r e n c e


Sentence Classification and Evaluation



Download 3,85 Mb.
Pdf ko'rish
bet109/342
Sana11.08.2022
Hajmi3,85 Mb.
#846838
1   ...   105   106   107   108   109   110   111   112   ...   342
Bog'liq
The Efficacy of Legal Videos in enhancin(1)

Sentence Classification and Evaluation 
Given the 
k
-gram sentence representation described so far, the next challenge was 
selecting an appropriate machine learning classifier that can handle a large dataset (over 
50,000 training points) in an efficient manner. In this paper, we report results using the 
Maximum Entropy method (Nigam et al., 1999) as implemented in the OpenNLP library 
(Baldridge et al., 2002), which performed better than Support Vector Machine and 
Multinomial Naive Bayes methods applied to our dataset. 


-120- 
2014 CALL Conference 
LINGUAPOLIS
www.antwerpcall.be 
During training, a separate Maximum Entropy classifier was learned for each value of 
k
(1 through 3) and for each type of sequence (stemmed words and part-of-speech 
tokens). Thus, when evaluating the algorithm, an ensemble of six classifiers was trained 
for each of the two classification tasks (i.e., moves and steps). To classify a new 
sentence, the outputs of the six classifiers were combined using a weighted combination 
in which the weights corresponded to prior estimates for the performance of each 
classifier (Lam & Suen, 1995).
The set of sentences in our dataset was highly unbalanced with respect to both moves 
and steps, which poses a special challenge to machine learning approaches as it often 
results in classifiers that are biased towards the majority class and away from the 
minority class. To tackle this problem, during the training stage, the sentences from 
steps that had less than 1,500 examples were oversampled so that for each step there 
are at least 1,500 training sentences.
To evaluate the proposed classification framework, we performed 10-fold cross 
validation. It must be noted here that the unbalanced nature of the dataset also 
complicates the evaluation; therefore, accuracy is not the most appropriate performance 
metric, as simply voting for the majority class can lead to high accuracy but is not better 
than randomly guessing. Therefore, we report performance in terms of Cohen’s kappa 
coefficient (Cohen, 1960), which is a statistic that compares the classifier’s accuracy 
against chance accuracy. A kappa of 0.0 would indicate a classifier that is no better than 
chance while a kappa of 1.0 would correspond to perfect classification.

Download 3,85 Mb.

Do'stlaringiz bilan baham:
1   ...   105   106   107   108   109   110   111   112   ...   342




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2024
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish