11.2
MODELLING THE MENTAL LEXICON
Models of the mental lexicon fall into two broad types: those that have attempted to characterise words in
the mind from the speaker’s perspective and those that have done so from the hearer’s perspective. The
former have attempted to represent speech production and the latter speech comprehension. We will explore
the speech comprehension models first before treating speech production models.
11.2.1
Understanding speech
What part is played by the mental lexicon in speech comprehension? Marslen-Wilson (1989b: 3) gives this
answer:
The role of the mental lexicon in human speech comprehension is to mediate between two
fundamentally distinct representational and computational domains: the acoustic-phonetic analysis of
the incoming speech signal, and the syntactic and semantic interpretation of the message being
communicated.
Marslen-Wilson observes that the central problem in speech perception is that the hearer is simultaneously
faced with two tasks. One task is decoding the acoustic signal that hits the ear-drums; the other task is
untangling the higher levels of word meaning, grammatical structure, sentence meaning and the meaning
that the speaker intended to convey. In other words, the task is one of deciphering noises and attaching
meanings to them. As we noted above, just the ability to do this is in itself remarkable. What is even more
amazing is the speed with which it is done. Normally we understand speech instantaneously. When
someone says something to us, we do not go away for half an hour and do all the necessary acoustic-
phonetic computations, followed afterwards by the syntactic-semantic analysis before coming up with an
interpretation. Native speakers of English have been shown to be able to recognise a word within about one
fifth of a second from the moment the speaker begins uttering it. If all goes well, in normal conversation
you literally figure out the words and their meaning before they are out of your interlocutor’s mouth. On
average, words are recognised in context 200 milliseconds from the moment the speaker utters the first
sound of the word, even though at that point there are insufficient auditory clues to identify the
word (Marslen-Wilson 1987, 1989b). Clearly, sensory input plays a role in identifying the words heard, but
it is not the only factor. The listener must be able to use other means. A lot of intelligent guessing goes on.
A particular auditory cue is tested for goodness of fit in the linguistic and non-linguistic context. With the
minimal phonetic clues obtained in the first 200 milliseconds, which word is most likely to make sense? We
will return to this later in subsections (
11.2.1
) and (
11.2.2
).
Let us take a closer look at the acoustic decoding task first. To recognise a word it helps if one can
identify the individual sounds which represent that word. So, the hearer goes through a PHONETIC
STAGE. This involves the identification of noises. To this end, the hearer looks out for the acoustic clues
which help to identify segments. For instance, to identify the first two sounds in the word spin the hearer,
164 ENGLISH WORDS
among other things, detects the turbulence of the fricative [s] and the fact that it is followed by a stop. It is
quite likely that initially it will be impossible to determine whether the stop is [p] or [b] because acoustically
it will be very unclear which it is. The next stage in sound perception is the PHONOLOGICAL STAGE. It
is at this stage that it will become clear that the sound in question is /p/, not /b/. If you are a speaker of
English, you know how sounds in your language function. You know the phonotactic constraints on the
positions where sounds can appear.
(i) You know that if a word-initial fricative is followed by a stop, that fricative must be /s/. Only /s/ is
allowed to occur at the beginning of a word if the second sound of the word is a consonant. No English
word can begin with /zk/, /fm/, / k/ etc.
(ii) You also know that if the word begins with /s/ and the sound following /s/ is a stop, that stop must be
voiceless. There are words like spin, spoon, stick, skin etc. where /s/ is followed by a voiceless stop.
But there are no words like *sbin, *sboon, *sdick, or *sgin where /s/ is followed by a voiced stop. This
is a phonotactic constraint on the combination of fricatives with stops in English phonology. It is not
something simply determined by their acoustic properties. If the /s/ of spin is electronically spliced, you
would probably hear the word left behind as [bin], not [pin]. Why is this? It is because the main cue for
distinguishing between [p] and [b] occurring initially in a stressed syllable is aspiration. If you detect
aspiration, you assume it is [p
h
]; if you do not you assume it is [b]. As the sound in spin was preceded
by /s/ before splicing off the [s], it was not initial and so it was unaspirated. So it is perceived as [b].
Obviously, linguistic knowledge of this kind, this COMPETENCE, lies hidden deep in the mind and
you are unlikely to be conscious of it without taking a course in linguistics.
We have established that there is no one-to-one match between acoustic phonetic cues and the phonological
interpretation we give them. What is perceived as the ‘same’ sound is not physically the same sound in all
contexts. Very much depends on the context in which the cues are perceived and on what we know to be
permissible in that context in the language.
One model that has been proposed to account for the way people perceive speech is ANALYSIS BY
SYNTHESIS (cf. Halle and Stevens (1962), Studdert-Kennedy (1974, 1976), Stevens and House (1972)).
Its proponents claim that hearers recognise speech sounds uttered by speakers by matching them with
speech sounds that are synthesised in their heads. Specifically, the synthesising is said to involve modelling
the articulatory gestures that the speaker makes to produce those sounds. In the light of what was said above
concerning the incredible speed of word recognition, it is implausible to expect hearers to perform the
analysis by synthesis routine. So, many reject this model.
Clearly, the perception of individual speech sounds is far from straightforward. But the perception of
running speech presents an even greater challenge. A major problem (which people are most acutely aware
of when listening to a language in which they have little competence) is that, in fluent speech, words come
out in a gushing stream. In purely physical terms, it is normally impossible to hear where one word ends and
the next one begins. Looking up each word in the mental lexicon as it is heard is not a credible strategy.
But even if it were possible to separate out words, which clearly it is not, there would be the additional
problem of NOISE, in a very broad sense. In many real life situations, there is not a perfect hush around us
as we speak. There is noise. Lots of noise. In a pub, at a party, at work, in a railway station, in the home, there
are often other people talking, banging, operating noisy machines, playing loud music etc. So we hear some
of the words only partially —if at all. Yet we manage to work out what the other person is saying. How do
we do it? Knowing what is relevant in the context helps. We can make intelligent guesses.
THE MENTAL LEXICON 165
In cases where we communicate using almost fixed formulas, GUESSING is relatively easy. Imagine you
drop in at a friend’s house at 11 a.m. and shortly after welcoming you, your host says:
[11.9]
Would you like *** or ***?
You do not hear the bits marked by asterisks because a loud heavy goods lorry goes past the open window
as she says ***. I expect you would have no problem guessing that the words you did not hear were tea and
coffee
. Experience tells you that in this situation they are the most likely words to be used.
Sometimes you are luckier. You manage to hear part of a word properly. In this situation again it is
usually possible to make out the entire word. Suppose the hearer identifies a bit of speech three syllables
long, beginning with [r] and ending in [te] as in [11.10a]:
[11.10]
a.
If you’re cold, get closer to the r***[te].
b.
If you’re cold, get closer to the radiator.
If you’re cold, get closer to the red heater.
The hearer can then guess which word or group of words with the phonological outline that has been
perceived seems to make sense in the context. Either a radiator or a red heater could provide warmth. A
quick inspection to see whether there was a radiator or a red heater in the room would help to settle it.
If we are in a particular situation, and know what is RELEVANT in that situation, we can discard some
of the possible words we might think we hear; we can also reject homophonous words which are
inappropriate in the circumstances, and select the more plausible, relevant word:
[11.11]
[vIkez er en straIk]
If you live in the English shipbuilding town of Barrow where most of the working population are
employed in Vicker’s shipbuilding yard and you have seen hundreds of women and men staging a protest
outside the gate of the shipyard, you would probably recognise the word as referring to workers at Vicker’s
and not the vicars with dog collars from all the town’s churches. Knowledge that priests do not strike is also
helpful, of course. So you would use your world knowledge to eliminate vicars. Context and relevance are
vitally important in speech recognition when for some reason the meaning of the words perceived is to some
extent unclear.
Do'stlaringiz bilan baham: |