Assessment and Examinations
147
machine. The answers are usually recorded non-linguistically,
by a tick or a cross in a box, a circle round a number or letter,
or the writing of a letter or number. Occasionally an actual
word or punctuation mark may be used. Typically such tests
take the multiple-choice format or a blank-filling format but
no real linguistic judgment is required of the marker.
Subjective tests on the other hand can only be marked by
human beings with the necessary linguistic knowledge, skill
and judgment. Usually the minimum requirement for an
answer is a complete sentence,
though sometimes single
words may be sufficient. It must be recognised, however, that
the creation and setting of both kinds is ultimately subjective,
since the choice of items, their relative prominence in the test
and so on are matters of the knowledge, skill and judgment of
the setter. Furthermore, evaluating a piece of language like a
free composition is virtually an entirely subjective matter, a
question of individual judgment,
and quasi-analytic
procedures like allocating so many marks for spelling, so
many for grammar, so many for ‘expression’ and so on do
almost nothing to reduce that fundamental subjectivity. A
checklist of points to watch may help to make the marking
more consistent but it is well to recognise that the marking is
none the less subjective.
It is frequently claimed that the results obtained from
objective tests are ‘better’ than those obtained from
subjectively marked tests or examinations, and books like
the classic
The Marking of English Essays by P.Hartog
et al.
with their frightening picture of the unreliability and
inconsistency of marking in public examinations give good
grounds for this claim. However, there are two devices which
may be used to improve the consistency and reliability of
subjective marking. One is to use the Nine Pile Technique
and the other is to use multiple marking.
The Nine Pile Technique is based on the assumption that
in any population the likelihood is that the distribution of
abilities will follow a normal curve, and that subjective
judgments are more reliable over scales with few points on
them than over scales with a large number of points on them.
In other words a five-point scale
will give reasonable results,
a fifty-point scale will not. Suppose a teacher has ninetynine
essays to mark. He will begin by reading these through
Assessment and Examinations
148
quickly and sorting them into three piles on the basis of a
straight global subjective evaluation: Good, Middling, Poor.
In order to get an approximately normal distribution he
would expect about seventeen of the ninety-nine to be Good,
sixty-five to be Middling, and seventeen to be Poor. Next he
takes the Good pile and sorts these on the basis of a second
reading
into Outstanding, Very Good, and Good piles. In the
Outstanding pile he might put only one essay, in the Very
Good pile four, and the remaining twelve in the Good pile.
Similarly he would sort the Poor pile into Appalling, Very
Poor, and Poor with approximately the same numbers.
Finally he would sort the Middling pile into three, Middling/
Good, Middling, and Middling/Bad in the proportion of
about twenty, twenty-five, and twenty.
This sorting gives a
ninepoint scale which has been arrived at by a double
marking involving an element of overlap. Obviously if the
second reading requires a Middling/Bad essay to go into the
Poor pile or a Poor essay to go into the Middling Pile such
adjustments can easily be made. This technique has been
shown to give good consistency as between different markers
and the same marker over time.
If this technique is then combined with multiple marking,
that is to say getting a second or third marker to re-read the
essays and to make adjustments between piles, the results are
likely to be even more consistent and reliable. There is a very
cogently argued case for
multiple marking made out in
Multiple Marking of English Compositions by J.Britton
et al.
Techniques such as these acknowledge the fundamentally
subjective nature of the assessments being made, but they
exploit the psychological realities of judgementmaking in a
controlled way and this is surely sensible and useful. The
time required for multiple marking is no greater than that
required for using a conventional analytic mark allocation
system and there seems little justification for clinging to the
well worn and substantially discredited ways.
All of the above is almost by way of being preliminary.
When the fundamentals of what assessing progress in learning
a foreign language really involves are considered it becomes
clearly apparent that it is the underlying
theoretical view of
what language is and how it works that is most important.