Alisher Navoiy nomidagi Toshkent
davlat o‘zbek tili va adabiyoti
universiteti
“O‘ZBEK MILLIY VA TA’LIMIY
KORPUSLARINI YARATISHNING NAZARIY
HAMDA AMALIY MASALALARI”
Xalqaro ilmiy-amaliy konferensiya
Vol. 1
№. 01 (2021)
289
A PROBLEM OF PART-OF-SPEECH TAGGING IN THE UZBEK LANGUAGE CORPUS
O‘ZBEK TILI KORPUSIDA SO‘Z TURKUMLARINI TEGLASH MASALASI
Rabbimov Ilyos Mehriddinovich
*
Kobilov Sami Saliyevich
**
102
Qurdoshev Zarifjon Mansur o‘g‘li
***
103
Annotation.
Feature extraction is important in opinion classification using machine learning
algorithms. In this paper, the methods of feature extraction used in automatic opinion classification are
analyzed, and the issue of opinion classification of Uzbek text is performed.
Keywords:
opinion classification, machine learning, feature extraction, stylistic features, statistical
features, part-of-speech features, semantic features.
Annotatsiya.
Mashinali o‘qitish algoritlaridan foydalanib fikrlarni tasniflashda informativ
belgilarni ajratish muhim hisoblanadi. Ushbu maqolada fikrlarni avtomatik tasniflashda qo‘llaniladigan
informativ belgilarni ajratish usullari tahlil qilinadi va oʻzbek tilida yozilgan matnlardagi fikrlarni
tasniflash masalasi bajariladi.
Kalit so‘zlar:
fikrlarni tasniflash, mashinali o‘qitish, informativ belgilarni ajratish, uslubiy
xususiyatlar, statistik xususiyatlar, so‘z turkumlariga asoslangan xususiyatlar, semantik xususiyatlar.
As the use of the Internet increases, so does the large amount of text created on social networks,
blogs, and e-commerce platforms. The demand for intellectual data analysis methods and algorithms,
which are used to extract high-quality data from such texts, is growing. One of the scientific directions of
intellectual analysis of textual data is the problem of automatic opinion classification. Opinion mining is
the study of people’s moods, viewpoints, opinions, attitudes, and feelings toward subjects, such as
services, products, research, political issues, organizations, and other topics. The purpose of opinion
classification is to determine whether the opinion in the text is positive, negative or neutral. Automatic
opinion classification can be used to determine customers’ opinions of products or services and adapting
them to their needs; in the analysis of public opinion on political events or new laws; when organizations
seek feedback on their employees; in the automatic analysis of opinions of famous people and
organizations about themselves and their brand. In general, the opinion classification is done at the
document level, sentence level, and aspect level. Approaches to automatic opinion classification can be
divided into rules-based methods, machine-based learning methods, and hybrid methods. The opinion
classification includes the following main steps:
collection of textual information;
pre-processing of textual data;
feature extraction;
designing classification algorithms.
In this paper, the methods of feature extraction used in automatic opinion classification are
analyzed, and the issue of opinion classification of Uzbek text is performed.
In the process of opinion classification using machine learning, deep learning algorithms, there is a
need to express texts in the form of numerical vectors. There is a lot of scientific research on the problem
of obtaining feature vectors that represents the statistical, lexical, stylistic, and semantic features of the
text. Stylistic, syntactic, part-of-speech (POS), lexicon-based features, as well as one-hot encoding, term
frequency-inverse document frequency (TF-IDF), Word2Vec, GloVe, and fastText models, are used for
feature extraction from text.
Do'stlaringiz bilan baham: |