49
THEME 12. USING CORPUS DATA FOR
PEDAGOGICAL PURPOSES
Plan:
1.
What is a corpus and how does it differ from a dictionary?
2.
Words in context: Finding out how words are used in a
language
3.
Should teachers use corpora?
A corpus is a collection of texts. We call it a corpus (plural:
corpora) when we use it for language research. That makes your
class's essays a corpus - a small one. It
also makes the internet a
corpus - a big one.
People writing dictionaries are in the vanguard of corpus
linguistics. If you are writing a dictionary, the biggest crime is to miss
things: to miss words, to miss phrases or idioms, to miss meanings of
words. Lexicographers (the people who write dictionaries) have
known for a long time that the best way to avoid missing things is to
have a big corpus, and a computer. The computer can then find all the
words (ordered by frequency) so a lexicographer can check the list to
make sure that words are not missed.
It can also show them all the examples of a word in context. This
is called a concordance. By running their eye over the concordance,
lexicographers can find all the meanings of the word, and phrases it is
in.
If it is a big corpus, or a common word (or both), there might be
thousands of examples of the word. Then, the computer can go one
step further, and prepare a 'word sketch',
a summary of the contexts,
collocations and phraseology for the word.
This is how contemporary lexicography works. Lexicographers
start from the word sketch, which gives them a good idea of what they
must not miss. They then work out what different meanings, grammar
and phraseology are shown by the collocations in the word sketch, and
write definitions for them. They can also use the corpus as a source for
example sentences.
When I say 'the computer', of course I mean an app that indexes
the corpus and lets users make concordances and word sketches.
Google is one app that does something like that (with the Internet as
50
its corpus). However it is not designed for people doing language
research. One that is widely used for making dictionaries, with lots of
corpora in it, that made the screenshots above, is the Sketch Engine.
Dictionary-makers were leaders in corpus use. Following on
were people writing language courses. They wanted to make sure that
the facts they were teaching about the language were in fact true (!),
and to teach common
patterns before rare ones, and to use authentic
examples of the patterns.
So, in English language teaching, there is plenty of indirect
corpus use, via dictionaries and course books.
What about direct
corpus use, by teachers, even students? Should you use corpora?
My answer is: yes - if the dictionary does not tell you enough. If
you want to find out what 'negotiation' or 'secede' means, you could
start from a corpus but it will be long and slow: better to look it up in
your favourite dictionary. But if you know 'negotiation' and want to
use it, but are not
sure what verb to use it with, then the leading
learners' dictionaries give little help. The word sketch, on the other
hand, promptly shows you that people resume, (re)start, (re)open and
conduct
negotiations, and that negotiations stall, fail, get bogged
down, drag on and even collapse (each item can be clicked, to see
examples of the collocation in use.
A second situation
is the teacher marking work, whose English is
good but who is not a native speaker. A student's essay has 'seceding
out of Ukraine' - is that OK? A quick check of the concordance for
'secede' shows that a region secedes from a country.
Another consideration is always student motivation. If a class is
currently engaged with volcanoes, it would be nice for them to look at
the English of volcanoes (I've felt an affinity for volcanoes ever since
my big end-of-primary-school project). So can we have a volcanic
corpus? Yes! The Sketch Engine has an instant corpus tool, where text
on a topic is gathered from the web in a few minutes (by a teacher or,
as a class exercise, by the students) and this is then the data for a mini
research project.
1
Do'stlaringiz bilan baham: