Table 4.3: Top 25 word frequency counts for SettCorp and TravCorp
SettCorp
TravCorp
Number
Word
%
Word
%
1
the
3.94
you
3.81
2
you
2.76
the
3.78
3
it
2.71
go
2.49
4
I
2.01
it
2.08
5
to
1.86
to
2.02
6
a
1.81
on
1.64
7
and
1.55
a
1.57
8
of
1.34
now
1.51
9
that
1.29
out
1.45
10
in
1.22
I
1.42
11
is
1.21
no
1.35
12
yeah
1.17
and
1.29
13
no
1.14
there
1.17
14
it’s
1.07
get
1.13
15
on
0.99
me
1.07
16
what
0.89
in
1.01
17
do
0.88
that
1.01
18
we
0.82
here
0.94
19
now
0.78
I’m
0.91
20
was
0.76
daddy
0.88
21
have
0.73
goin
0.85
22
one
0.72
way
0.85
23
there
0.71
what
0.85
24
like
0.66
yeah
0.85
25
all
0.64
look
0.82
Table 4.3, in addition to highlighting some
potential
pragmatic similarities (marked
) between SettCorp and TravCorp, may also point toward likely differences
(marked ). For example, in relation to personal pronouns characteristic of the
deictic system in many languages,
you
(position one in SettCorp and two in
TravCorp) features more frequently than
I
(position four in SettCorp and ten in
TravCorp). In contrast, the pronoun
we
is in 18
th
position in SettCorp but does not
appear in the top 25 words in TravCorp. Furthermore the item
that
, which can
potentially function as a deictic marker and the item
now
, traditionally associated
with both temporal deictic and discoursal functions, features on both word frequency
lists. Further variational between the two datasets that might be indicated by the
frequency lists is the term of address
daddy
which occurs in 20
th
position in
TravCorp but is not present on the SettCorp frequency list in Table 4.3. Finally, one
token with the potential to hedge in Irish English,
like
, appears in SettCorp (position
90
24) but does not feature within the top 25 items on the TravCorp list.
4
These
similarities and differences between the two corpora offer fertile ground for the
researcher, especially in the initial stages of the research. For the researcher of
variational pragmatics, they offer a fledgling variational profile, whereas they offer a
tantalising glimpse of the presence (or not) of a shared linguistic repertoire for those
investigating the community of practice.
Corpus word frequency lists are, admittedly, a raw measure of comparability, based
on, as Table 4.3 demonstrates, the potential of a word form rather than its actual
function. Jautz (2008) examines the BNC and the Wellington Corpus of Spoken
New Zealand English for expressions of gratitude in British and New Zealand radio
phone-in and broadcast interviews. She comments that based solely on frequency,
there are 287 expressions of gratitude in the British corpus and 129 in the New
Zealand corpus, suggesting that the British are more polite because they use more
expressions of gratitude. However, Jautz demonstrates that when these expressions
are analysed more closely, the opposite appears to be the case. Similarly, Farr and
O’Keeffe (2002) examine the occurrences of the hedges
I would say
and
I’d say
in
three spoken corpora: LCIE, CANCODE and a corpus of American spoken data
from the Cambridge International Corpus. They found that these hedges are used
twice as frequently by Irish speakers than by their American counterparts. However,
they label this initial finding ‘restrictive in its insightfulness’ (p. 29) due to the fact
that the quantitative and geographically-constrained results generated by larger
corpora do not further an understanding of how or why hedges are used in face-to-
face interaction.
Many corpus studies recommend that frequency analysis be complemented by a
detailed consideration of the environment of key words through the use of
concordances and collocational tools. For example, O’Keeffe and Adolphs (2008:
93) maintain that when there is a need to disambiguate form and function, corpus
linguistics provides ‘direct access to the source files and the exact location in the
original conversations in which the items occurred.’ For example, to add further
insight into their raw frequency results, Farr and O’Keeffe (2002) explore the use of
4
See Chapter 8 for more details on the use of
like
in the pragmatic systems of the two families.
91
would
as a hedging device in an Irish setting using two varietal sub-corpora from
LCIE, a 55,000 word corpus of radio phone-in and a 52,000 word corpus of post-
observation teacher training interaction. Based on a qualitative examination of the
hedges as they appear in context, in addition to confirming that Irish speakers soften
face threatening acts such as disagreement or giving advice, they also found that
very often speakers downtone when speaking about themselves, even where the
propositional content is undisputed. This led them to conclude that hedges have a
broader pragmatic function in Irish English settings. They propose that in order to
fully understand why speakers hedge it is necessary to consider the Irish socio-
cultural context. They maintain that ‘in Irish society, directness is very often
avoided…‘forwardness’, which ranges from being direct to being self-promoting, is
not valued’ (p. 42). Therefore, Irish speakers may feel added pressure to hedge in
situations where British or American speakers may think it unnecessary. Farr and
O’Keeffe’s study demonstrates the merit of a two-pronged approach to the use of
corpora in variational pragmatics, where intra-varietal, qualitative research involving
smaller corpora is used to inform inter-varietal, quantitative corpus research.
Corpus-based variational pragmatic analysis can be further complemented by the
demographic speaker information that accompanies conversations contained in many
modern spoken corpora, thereby allowing both a micro- and macro-social
interpretation of the corpus results. O’Keeffe and Adolphs (2008) analyse the form
and function of response tokens across British and Irish English. To examine form,
they analysed two one-million word corpus samples from CANCODE and LCIE.
From these samples, they generated word and cluster lists and these were manually
cross-checked with transcripts using concordancing. They demonstrate that, in terms
of overall frequency, listener response tokens are far more frequent in British
English than in Irish English. In order to compare the data functionally, they
analysed two 20,000 word subcorpora of casual conversation taken from LCIE and
CANCODE. The demographic information provided by CANCODE and LCIE
allowed them to closely match their data in terms of gender, age, social relationship,
socio-economic class and genre of discourse. Accordingly, in both subcorpora the
participants were female university students in shared accommodation, that were
close friends and of similar age (around 20). By controlling for macro-social
categories of gender, age and socio-economic class, O’Keeffe and Adolphs were
92
able to make an accurate generalisation across two varieties of the same language.
They again found that listener response tokens were more frequent among the
British participants. However, their analysis revealed no pragmatic variation in the
function of the response tokens across the two subcorpora. Orpin (2005) maintains
that corpus analysis allows the researcher to construct a detailed ‘semantic profile’
of a word. Similarly, the synergy of the variational pragmatic research agenda with a
corpus linguistic methodology allows those working in variational pragmatics to
construct a detailed ‘pragmatic profile’ of individual words, clusters or acts.
4.3.2 The issue of representativeness
As already mentioned, representativeness is an issue that has been highlighted by
Schneider and Barron (2008) as a weakness of previous, cross-cultural pragmatic
studies. In Chapter 3, it was also proposed that the community of practice offers
variational pragmatics a vehicle through which generalisations may be made (see
Section 3.5). In relation to a corpus methodology, Leech (1991: 27) maintains that a
corpus is representative if ‘findings based on its contents can be generalised to a
larger hypothetical corpus.’ Therefore, in the case of a corpus said to represent a
language variety, it is in fact representative if its findings can be generalised to the
said language variety. According to Tognini-Bonelli (2001: 2), ‘a corpus can be
defined as a collection of texts assumed to be representative of a particular language
put together so that it can be used for linguistic analysis’ (see also Sinclair, 2004).
She maintains that a corpus is constructed with a number of underlying assumptions:
the language is naturally occurring; it is gathered according to explicit design
criteria; it has a specific purpose in mind; and it has a claim to represent larger
chunks of language selected according to a specific typology (see also Biber, 1993).
These assumptions were primary considerations when the design and construction of
both SettCorp and TravCorp was undertaken. As discussed, both TravCorp and
SettCorp contain a collection of texts that were gathered according to specific design
criteria (Section 4.1). In terms of the representativeness and balance of the texts to
be included in the corpora, a similar approach to that of CANCODE was adopted.
Both corpora were designed to ensure that McCarthy’s (1998) three conversational
goal-types,
collaborative
idea
,
collaborative task
and
information provision
, were
included. Collaborative ideas, according to McCarthy (
ibid
: 10), ‘are concerned with
93
the interactive sharing of thoughts, judgements, opinions and attitudes.’
Collaborative task refers to conversational participants interacting with their physical
environment while talking and information provision ‘is primarily uni-directional,
with one party imparting information to others’ (
ibid.
). Thus SettCorp features, for
example, the family putting up the Christmas tree (collaborative task), talking about
being a student in university (collaborative idea) and providing information about a
city one of them is going to visit (information provision). Similarly, TravCorp
contains goal-types such as the family cleaning their home (collaborative task),
discussing the ownership of a mobile phone (collaborative idea) and relating a
workplace story (information provision).
Both TravCorp and SettCorp are domain-specific, specialised corpora in that they
represent a particular register (family discourse) and, therefore, questions of
representativeness and balance should be considered with this is mind. In order to
examine to what extent findings from these corpora can be generalised to a larger
corpus, frequency lists were generated for TravCorp, SettCorp and a reference
corpus, the Limerick Corpus of Irish English (LCIE). The frequency lists for
SettCorp and LCIE are shown in direct comparison in Table 4.4 (types with similar
frequency are marked , notable differences between the two corpora are marked
):
Do'stlaringiz bilan baham: |