from primarily written sources and then recording the meaning of the
words (as well as additional information) on a citation slip, literally a slip
of paper for earlier dictionaries but now a computerized file containing
such information as the sentence in which the word was used as well as a
bibliographic entry of the source from which the word was taken.
Lexicographers rely heavily on citation slips because they determine the
meanings of words from the contexts in which they occur. The creation of
the
OED provides a good illustration of how this process works.
The
OED was an extremely ambitious project. As articulated in the (1859)
statement “Proposal for the Publication of a New English Dictionary by
the Philological Society,” the dictionary was to include every word in the
English language from 1250 to 1858. Words to
be included in the diction-
ary would be based on vocabulary found in printed matter written during
these years. These goals resulted in “the only English dictionary ever cre-
ated wholly on the basis of citations” (Landau 2001: 191). The heavily
empirical nature of the
OED placed a great burden on its creators to find
individuals willing to read books and create citation slips. To find volun-
teers, both the 1859 “Proposal” and a later (1879) document (“An Appeal to
the English-speaking and English-reading public to read books and make
extracts for the Philological Society’s
New English Dictionary”), written after
James A. H. Murray became editor, actively solicited readers.
Because the
OED was intended to be a historical dictionary, it was decid-
ed that it should include vocabulary taken from texts written during
three time periods: 1250–1526, 1526–1674, and 1674–1858. These three
time frames were chosen because they delineate periods “into which our
language may,
for philological purposes, be most conveniently divided”
(from
Proposal for the Publication of A New English Dictionary by the Philological
Society, Philological Society, 1859, p. 5). The year 1526, for instance, marked
the publication of the first printed edition of the New Testament in
English, 1674 the death of Milton. While these are certainly important his-
torical events, they hardly correspond to the major periods in the devel-
opment of English, especially since the
OED is based exclusively on written
texts, ignoring speech completely. Moreover, as Landau (2001: 207) notes,
“the core of citation files tend to be those of the educated and upper
classes,” hardly making them representative of the language as a whole.
But since there was really no feasible way (or desire, for that matter) to col-
lect spoken data during this period, it was
unavoidable that the data be
biased in favor of written English.
The first edition of the
OED, published in 1928, was based on four mil-
lion citation slips supplied by approximately 2,000 readers (Francis 1992:
21). These individuals, as Gilliver (2000: 232) notes, either provided specif-
ic examples of words, or collected them from sources they were asked to
read. Gilliver (2000) provides brief descriptions of the contributions that
some of these individuals made. For instance, one of the early editors of
the
OED, Frederick James Furnivall, supplied 30,000 quotations taken from
newspapers and magazines (p. 238). Harwig Richard Helwich, a Viennese
philologist, supplied 50,000 quotations, many
from a medieval poem enti-
tled
Cursor Mundi, “the most frequently cited work in the dictionary”
160
INTRODUCING ENGLISH LINGUISTICS
(p. 239). The physician Charles Gray contributed 29,000 quotations, many
providing examples of function words taken from texts written in the
eighteenth century (p. 238).
Specific instructions were given to readers telling them how they
should collect words for inclusion on citation slips:
Make a quotation for
every word that strikes you as rare, obsolete, old-
fashioned, new, peculiar, or used in a peculiar way.
Take special note of passages which show or imply that a word is either
new and tentative, or needing explanation as obsolete or archaic, and
which thus help to fix the date of its introduction or disuse.
Make as
many quotations
as you can for ordinary words, especially when
they
are used significantly, and tend by the context to explain or suggest
their own meaning.
(from the Historical Introduction of the original
OED,
reprinted in Murray 1971: vi)
After a word was selected, it needed to be included on a citation slip,
which had a specific format, illustrated in Figure 6.1. The word appeared
in the upper left-hand corner of the slip, and was followed below by com-
plete bibliographical information of the source from which the word was
taken. The quotation itself was placed at the bottom of the slip. The slips
were then sent to Oxford, where they were placed in one of the 1,029
pigeon-holes in a
Scriptorium constructed by the main editor of the
OED,
James A. H. Murray. Murray and his assistants used the citation slips as the
basis for entries and illustrative quotations in the
OED.
English words: Structure and meaning
161
Britisher
1883 Freeman Impressions U.S. iv. 29
I always told my American friends that I had rather be
called a Britisher than an Englishman, if by calling me
an Englishman they meant to imply that they were not
Englishmen themselves
FIGURE
6.1
Citation slip from
OED.
Because lexicographers rely so heavily on context for meaning, they
must base their dictionaries on citation slips taken from very large
sources of text. This is because the frequency of vocabulary in a given text
is determined by Zipf’s Law, a formula for calculating word frequency
developed by George Kingsley Zipf (see Zipf 1932 for details). Essentially,
Zipf’s Law predicts that in any text, a small number of words will occur
very frequently and a large number of words will occur quite rarely. To
illustrate this point, consider the distribution
of vocabulary in an earlier
paragraph in this section that contained a total of 119 words.
Table 6.2 lists the words in the paragraph occurring three or more times.
These eight words (including the combined frequencies of singular and
plural forms of
word) constituted 36% (43 of 119) of the words occurring in
the paragraph. Of the remaining words in the paragraph, 12 words (20%)
occurred twice (24 of 119), and 52 words (44%) occurred once (52 of 119). As
Table 6.2 shows, the most frequent words were function words: the articles
the and
a and prepositions like
of and
on. The least frequent words in the
paragraph were content words – words such as
methodology,
monolingual,
and
discovering that occurred only once. What these distributions mean for
lexicographers is that they must collect examples from very large databas-
es or they will not capture all the words occurring in a language or all the
meanings that they have, since the words that are of primary concern to
lexicographers – content words – are the words that occur least frequently.
For this reason, modern lexicographers
have abandoned handwritten
citation slips created by thousands of individuals and have turned instead
to collecting examples automatically from very large corpora. For
instance, the publisher Harper-Collins created the
Collins Word Web
(www.collins.co.uk/books.aspx?group
180, accessed September 10, 2007)
as the source for citation files used to create a number of dictionaries that
they have published, including
The Collins English Dictionary (2007). The
Collins Word Web is currently 2.5 billion words in length and contains var-
ious kinds of spoken and written English. It is constantly being updated so
that new words entering the language can be detected and included in
upcoming editions of dictionaries.
Advances in software development have also aided in the creation of
citation slips. A concordancing program can be used on any computerized
text to very quickly create a KWIK (keyword in context) concordance.
Figure 6.2 contains a KWIK concordance window based on a sampling of
occurrences
of the word chair in the Cambridge International Corpus, a
word whose meaning was discussed earlier in the chapter. As this figure
illustrates, all instances of
chair are vertically aligned so that their use in
162
INTRODUCING ENGLISH LINGUISTICS
Do'stlaringiz bilan baham: