-105-
2014 CALL Conference
LINGUAPOLIS
www.antwerpcall.be
They noted a positive reaction from students and teachers at the time but, despite the
very significant strides made in the development of TTS systems in the intervening
quarter century, this anticipated potential has not been widely exploited to date. TTS
has not had a great deal of interest among second/foreign language teachers and has not
been widely used in CALL (Gupta & Schulze, 2012). Various reasons for this have been
proffered - less than perfect prosody; lack of clarity and consistency; voices being
insufficiently expressive; lacking in human emotion; only small range of voices available
(Delogu, Conte, & Sementina, 1998; Keller & Zellner-Keller, 2000; Sha, 2010). Higgins
(1988) concluded that one should not bother using synthesised speech as a model for
learners because it cannot account for the complexity of human language. While there
have been huge advances in the TTS technology since the 1980s it is still less than
perfect and some claim that the basic criticism still holds valid (Handley, 2005).
If Computer-Assisted Language Learning (CALL) programs are to be developed where the
learner can have a genuinely interactive relationship with the computer so that learning
can become highly personalised and tailored to the unique needs of each individual, then
TTS synthesis comes centre stage. Speech recognition technology also has an obvious
role but it is difficult to envisage at present how it may deal with recognising the
imperfect utterances of learners and this question is not being considered in the present
paper.
With the aid of increasingly robust artificial intelligence (AI) systems computers can
interact with the learner in relatively unpredictable ways. It would therefore not be
possible to pre-record novel utterances which the computer may produce. The question
arises as to whether present day TTS systems are ‘good enough’ to be part of a
personalised CALL program which offers the learner greater control over his/her own
learning process. This paper describes an evaluation of an Irish language TTS system,
ABAIR
, which is being developed at the Centre for Language and Communications
Studies, Trinity College, Dublin, in the context of its being used in a specifically designed
interactive game-like CALL environment.
The general framework on which the
ABAIR
system is built is Festival 1.95, which is a
version of the Festival Speech Synthesis System (Clark, Richmond, & King, 2004)
developed at the Centre for Speech Technology Research (CSTR) at the University of
Edinburgh. Festival offers a general framework for building speech synthesis systems. It
is an open source system, from which a number of speech engines are available for the
development of new voices using unit-selection synthesis. Unit-selection synthesis is
based on recordings of real human speech. It is a form of concatenative speech synthesis
where potentially minute segments of recorded speech are pieced together in order to
produce synthetic speech output. This is one of the most popular techniques in use to
date, particularly by commercial companies, as it generates the most natural sounding
speech output.
Do'stlaringiz bilan baham: