Neasa Ní Chiaráin, Ailbhe Ní Chasaide
Trinity College, Dublin, Ireland
nichiarn@tcd.ie
Evaluating Text-To-Speech Synthesis for CALL Platforms
Bio data
Neasa Ní Chiaráin
is currently a postdoctoral researcher in the Phonetics & Speech Lab,
Centre for Language & Communication Studies, Trinity College, Dublin. She is primarily
working on the development of the Irish language text-to-speech (TTS) synthesis project
ABAIR. This involves developing the linguistic resources that underpin the TTS synthesis
system. In parallel, she is conducting research on the development and application of
Computer-Assisted Language Learning (CALL) materials which incorporate TTS voices.
She has recently completed a PhD thesis entitled “TTS synthesis in CALL: The
Development and Evaluation of Irish Language C
ALL Platforms”.
Abstract
This paper analyses factors associated with the methodologies used for the evaluation of
text-to-speech (TTS) synthesis in the context of CALL platforms. The study is based
specifically on Irish language unit-selection TTS, as developed in the School of Linguistic,
Speech and Communication Studies, Trinity College, Dublin, as part of the ABAIR project.
It has been claimed that evaluations of TTS synthesis should be based on both
judgmental and empirical analyses (Chapelle, 2001). Both approaches to analysis are
evident in the literature (see, for example: Pellegrini, Costa, & Trancoso (2012) where
Word Error Rate (WER) is measured after a dictation exercise, and Handley (2009) where
evaluators were asked their opinions as to the adequacy, acceptability and quality of the
TTS voices in specific roles). This paper presents a formula which has been devised in
order to bring together both judgmental and empirical evaluation measures, by taking
two separate scores (a ‘performance’ and
an ‘opinion’ score) into account. It is argued
that the intelligibility of synthetic speech is not “an all or nothing” phenomenon. Even if
listeners can transcribe TTS content successfully an assessment of the ease with which
they can do so is highly relevant to the evaluation of the speech.
In this study Irish language teachers (N=31) were asked (1) to orthographically
transcribe 20 sentences and (2) to give their opinion of the ease with which they could
do so. The scores for each of the measures were combined. It is argued that these
combined scores give a truer evaluation of the intelligibility of the TTS synthesis than
may be gleaned from either judgmental or empirical measures on their own.
Do'stlaringiz bilan baham: |