Table 4.2: CANCODE’s matrix of speech genres (operationalised)
3
Collaborative Idea
Collaborative Task
Information
Provision
Pedagogical
Group tutorial
Individual tutorial - discussing
student's work
Lecture
Professional
Collaborative office
meeting
Colleagues moving furniture
Work presentation
Transactional
Chatting with bank clerk
Buying a stereo system
Commentary by
library tour guide
Socialising
Chatting with friends
about shared experiences
Assembling shelves
Telling jokes
Intimate
Discussing family matters
Cooking together
Relating story of
film seen
This genre-based approach, according to McCarthy (
ibid
: 9), ‘offers the possibility
of linking their [the data] contextual and social features directly with the lexico-
grammatical ‘nuts and bolts’ of their step-by-step creation.’ More recently, the
Limerick Corpus of Irish English (LCIE), a one million word corpus of naturally
occurring spoken Irish English, was built to parallel CANCODE’s matrix of speech
genres and allow for a full description of spoken Irish English in these contexts (for
a full description of the design of LCIE see Farr
et al
., 2004).
There are also a number of register-specific spoken corpora available to the language
researcher. The Michigan Corpus of Spoken Academic English (MICASE), designed
to examine the characteristics of contemporary American academic speech, has
approximately 1.8 million words. The MICASE designers also employed context-
governed criteria in collecting the data. The corpus contains speech events across the
major academic disciplines in a university, for example, biological and health
sciences, physical sciences and engineering and humanities and the arts.
Demographic information such as age, gender, academic role and first language
3
Adapted from McCarthy (1998: 10).
85
were also recorded. Recently, two additional corpora, the British Academic Spoken
English corpus (BASE) and the Limerick-Belfast Corpus of Academic Spoken
English (Li-Bel CASE), have been designed as companions to MICASE. The BASE
corpus contains 1.6 million words, whereas Li-Bel CASE, when completed, will
hold one million words. In addition to this, there are a number of spoken corpora that
represent specific social groupings. For example, the Bergen Corpus of London
Teenage Language (COLT) is a half a million word corpus of spontaneous teenage
talk. This corpus distinguishes between speaker-specific (for example gender, age,
social class etc.) and context-specific (location and setting) information.
Thus far, it seems that major spoken corpora are quite substantial at over half a
million words at least. In relation to corpus size, Sinclair (2004: 189) maintains that
‘there is no virtue in being small. Small is not beautiful; it is simply a limitation.’
However, in spite of this, it may be the case that small corpora are more adept than
larger ones at explaining the fine-grained distinction that exists between registers.
Biber
et al
.’s (1999) forty million word Longman Spoken and Written English
Corpus (LSWE) is divided into six registers; conversation, fiction, newspaper
language, academic prose, non-conversational speech and general prose. However,
within each of these registers is an enormous amount of variation. For example,
Hunston (2002) notes that newspaper language contains a variety of newspaper types
(for example, broadsheet and tabloid) in addition to a range of article types (hard
news, letters, sport, business etc.). Indeed, it could be argued that conversation
contains an even wider variation of types. Therefore, for larger corpora such as the
one used in Biber
et al
.’s (1999) grammar, ‘to make distinctions between ‘smaller’
registers would quickly become unmanageable’ (Hunston, 2002: 161). Small corpora
studies have highlighted a range of variation that exists both in and between different
language varieties and registers.
Small corpora have allowed researchers to identify linguistic characteristics of
particular spoken registers. Vaughan (2007, 2008) uses a 40,000 word corpus of
meetings of English language teachers (C-MELT) to explore particular linguistic
features characteristic of this community of practice. For example, the size of C-
MELT allowed specific instances of humour to be isolated in order that they might
be assigned a function. Vaughan (2007: 186) found that teachers ‘use [humour] to
86
establish the social space they share, and implicitly define who they are, and what
their attitude is to the work they do.’ Farr (2007) claims that, in relation to teacher
education, ‘a spoken language corpus can be a valuable instrument in the toolbox of
professional development’ (p. 254). Farr’s 80,000 word POTTI corpus has allowed
the identification of areas for development and also areas of professional strength
within this context. For example, Farr (2005) explores the use of relational strategies
present in the data to demonstrate how trainers work to lessen asymmetrical speech
relationships. She claims that small talk, in particular talk about health issues, is a
typical way of establishing solidarity between speakers in this context. Furthermore,
she demonstrates how shared socio-cultural references such as
muinteóir
, the Gaelic
word for
teacher
, are a method of diluting institutional power on the part of the
teacher trainer in interaction with the trainee.
O’Keeffe (2005) employs a 55,000 word corpus from radio phone-in to focus on
question forms in this context and illustrates that, although many asymmetrical
norms of institutional discourse apply to this context, there is widespread
downtoning of power at a lexico-grammatical level. In addition to using hedges, the
presenter of the radio show employs a variety of features such as first name
vocatives, latching and reflexive pronouns, as in
What are you doing with yourself
nowadays?
, to create a ‘pseudo-intimate’ (p. 340) environment between speaker and
caller. Koester (2006) investigated a 34,000 word corpus of American and British
office talk and demonstrated the influence of local contexts on frequency and use of
various words or patterns. For example, she found that modal verbs of obligations
are more frequent in collaborative genres (for example, decision making or
planning) than in unidirectional genres (for example, giving instructions). Finally,
Cutting (2001) investigated the evaluative speech acts of six students as they became
members of an academic discourse community, on a taught Master’s course in
Applied Linguistics. Cutting isolated and tagged each of these speech acts and found
that positive acts increase as the course progresses and participants build solidarity.
She also found that negative speech acts are most common in conversations about
the course. Cutting deliberately limited the corpus used to 26,000 words so that she
‘could become familiar enough with each one’s [participants] linguistic
idiosyncrasies, personalities and attitudes to interpret the findings’ (p. 1208-1209),
an approach that would be very difficult with a larger corpus. Similarly, in this
87
study, it is proposed that the datasets used provide a basis for a more in-depth
interpretation of the linguistic characteristics of both families. Therefore, the data
from the settled family and from the Traveller family will subsequently be referred
to as SettCorp and TravCorp respectively.
Do'stlaringiz bilan baham: |