Creating parallel and comparable corpora for work in domain specific areas of language


Parallel corpora - alignment & annotation problems



Download 83,48 Kb.
bet2/6
Sana28.10.2022
Hajmi83,48 Kb.
#857910
1   2   3   4   5   6
Bog'liq
1-sentyabr ssenariy, Test 2, 1666077157, BTM 2217, Ijara shartnoma 1676633593557, Ijara shartnoma 1676867583415, AJRBF MAY 2022 FULL JOURNAL-38, AJRBF MAY 2022 FULL JOURNAL-2, классификация отклонений и расположения поверхности механизма нико, Статья Равшан ПОВЫШЕНИЕ УСТОЙЧИВОСТИ К АГРЕССИВНЫМ СРЕДАМ КОМПОЗИЦИОННЫХ МАТЕРИАЛОВ ПУТЕМ ПОКРЫТИЯ ПОЛИМЕРАМИ, ПУТЁМ ПОНИЖЕНИЯ ВНУТРЕННИХ НАПРЯЖЕНИЙ И РАЗРАБОТКА ТЕХНОЛОГИИ ИХ ПОЛУЧЕНИЯ, статья никита, 3, Eng kichik kvadratlar usuli, 1-TOPSHIRIRQ MET

Parallel corpora - alignment & annotation problems

  • Different linguistic theories = different annotation schemes
    • E.g. Morphological, syntactic or semantic?
  • Different languages = different annotation schemes
    • E.g. English / Portuguese / Polish / Finnish /Chinese
  • Different languages = different types of alignment
    • E.g. English / Hebrew / Chinese

Parallel corpora - professional uses

  • Translation memories – aligned collections of repetitive texts in special domains
    • Provide previous translations for translator to consult / copy
    • Allow economy in translation process
    • Provide material for probabilistic machine translation
    • E.g. EU translation services, Canadian Hansard

Translation memories – requirements

  • “Garbage in = garbage out!”
  • Original > good quality – hence
    • Emphasis on: good editing and proof reading > controlled language
    • E.g. EU documentation – training people to edit English documents written by non-native speakers
  • Translation > good quality – but certain parallel relationship to the original
  • Therefore: tendency to homogeneity
    • (e.g. Eurospeak)

Parallel corpora - academic uses

  • For studying the translation process
  • For studying translation solutions
  • E.g.
    • INTERSECT – French/English (Brighton)
    • English-Norwegian Parallel Corpus Project (Oslo)
    • COMPARA/DISPARA – Portuguese/English – online at http://www.portugues.mct.pt/
  • For terminology extraction

Parallel corpora - requirements

  • Theory should allow for any original + translation - warts and all!
    • Much literary criticism of translation thrives on the ‘warts’!
    • Useful for study of errors, translationese etc
  • Practical applications require quality:
    • Contrastive linguistics
    • Pedagogical applications
    • Terminology extraction

Download 83,48 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2023
ma'muriyatiga murojaat qiling

    Bosh sahifa
davlat universiteti
axborot texnologiyalari
ta’lim vazirligi
zbekiston respublikasi
maxsus ta’lim
guruh talabasi
nomidagi toshkent
O’zbekiston respublikasi
toshkent axborot
texnologiyalari universiteti
xorazmiy nomidagi
o’rta maxsus
davlat pedagogika
rivojlantirish vazirligi
pedagogika instituti
Ўзбекистон республикаси
tashkil etish
vazirligi muhammad
haqida tushuncha
respublikasi axborot
toshkent davlat
kommunikatsiyalarini rivojlantirish
таълим вазирлиги
O'zbekiston respublikasi
махсус таълим
vazirligi toshkent
fanidan tayyorlagan
bilan ishlash
saqlash vazirligi
Ishdan maqsad
Toshkent davlat
fanidan mustaqil
sog'liqni saqlash
uzbekistan coronavirus
haqida umumiy
respublikasi sog'liqni
coronavirus covid
vazirligi koronavirus
koronavirus covid
covid vaccination
qarshi emlanganlik
risida sertifikat
vaccination certificate
sertifikat ministry
o’rta ta’lim
pedagogika universiteti
matematika fakulteti
ishlab chiqarish
fanlar fakulteti
moliya instituti
fanining predmeti