considered a minimal passing grade).
Construct-Related Evidence
A third kind of evidence that can support validity, but one that does not play as large a role
98
classroom teachers, is construct-related validity, commonly referred to as construct
validity. A construct is any theory, hypothesis, or model that attempts to explain observed
phenomena in our universe of perceptions. Constructs may or may not be directly or
empirically measured-their verification often requires inferential data.
D. Authenticity
An fourth major principle of language testing is authenticity, a concept that is a little
slippery to define, especially within the art and science of evaluating and designing tests.
Bachman and Palmer (1996, p. 23) define authenticity as “the degree of correspondence of
the characteristics of a given language test task to the features of a target language task,”
and then suggest an agenda for identifying those target language tasks and for
transforming them into valid test items.
E. Washback
A facet of consequential validity, discussed above, is “the effect of testing on teaching and
learning” (Hughes, 2003, p. 1), otherwise known among language-testing specialists as
washback. In large-scale assessment, wasback generally refers to the effects the test have
on instruction in terms of how students prepare for the test. “Cram” courses and “teaching
to the test” are examples of such washback. Another form of washback that occurs more in
classroom assessment is the information that “washes back” to students in the form of
useful diagnoses of strengths and weaknesses. Washback also includes the effects of an
assessment on teaching and learning prior to the assessment itself, that is, on preparation
for the assessment.
F. Applying Principles to the Evaluation of Classroom Tests
The five principles of practicality, reliability, validity, authenticity, and washback go a long
way toward providing useful guidelines for both evaluating an existing assessment
procedure and designing one on your own. Quizzes, tests, final exams, and standardized
proficiency tests can all be scrutinized through these five lenses.
Are the test procedures practical?
Is the test reliable?
Does the procedure demonstrate content validity?
Is the procedures face valid and “biased for best”?
Are the test tasks as authentic as possible?
Does the test other beneficial washback to the learner?
Do'stlaringiz bilan baham: |