Content introduction I. Chapter cefr exam Information


Rating Design for Field Trial



Download 68 Kb.
bet12/14
Sana03.07.2022
Hajmi68 Kb.
#735650
1   ...   6   7   8   9   10   11   12   13   14
Bog'liq
2. Assessing learner\'s writing skills according to CEFR scales

Rating Design for Field Trial
The field trial responses were rated between September and October 2007. To analyze the effects of the design factors (tasks, rating criteria, raters, student proficiencies), the two following rating designs were used. The first rating design, the so-called multiple marking design, involved all raters, in groups of four, independently rating the same set of selected student responses within one group. For this, 30 responses from each of the 13 booklets were randomly chosen and allocated to the rater groups in a Youden square design (Preece, 1990; see Frey et al., 2009). The Youden square design is a particular form of an incomplete block design that in our case ensured a linkage of ratings across all booklets and an even distribution of rater combinations across booklets. The resulting linkage of students, tasks, and raters allowed us to perform variance component analyses motivated by g-theory, as described next.
The second design, the so-called single marking design, allocated all student responses randomly to all raters, with each rater rating an equal number of responses and each response being rated once. This design allowed controlling for systematic rater effects by ensuring an approximately balanced allocation of student responses across tasks to different raters.
The multifaceted Rasch analyses, which are described next, are based on the combined data from the two rating designs to ensure a sufficiently strong linkage between tasks, raters, and students.
Data Analysis
Given the previous considerations about quality control in rating procedures and the lack of a strong research base for level-specific approaches to assessing writing proficiency, the primary objective of the study we report is on establishing the psychometric qualities of the writing tasks using the ratings from the field trial. This is critical because the ratings form the basis of inferences about task difficulty estimates, raters' performance, and students' proficiency estimates. If the rating quality is poor vis-à-vis the design characteristics used in the field trial, the defensibility of any resulting narratives about the difficulty of the writing tasks and their alignment to the CEFR levels is compromised.
In more specific terms, the two research questions for this study based on the primary objective are as follows:

  • RQ1: What are the relative contributions of each of the design factors (tasks, criteria, raters, students) to the overall variability in the ratings of the HSA and MSA student samples?

  • RQ2: Based on the analyses in RQ1, how closely do empirical estimates of task difficulty and a priori estimates of task difficulty by task developers align? Is it possible to arrive at empirically grounded cut-scores in alignment with the CEFR using suitable statistical analyses?

We answer both research questions separately for students from the HSA and MSA school tracks. We do this primarily because preliminary calibrations with the writing data, which are not the main focus of this article, as well as related data on reading and listening proficiency tests have suggested that a separate calibration leads to more reliable and defensible interpretations. This decision was also partly politically motivated by the need for consistent and defensible reporting strategies across reading, listening, and writing proficiency tests.
To answer the first research question, we use descriptive statistics of the rating data as well as variance components analyses grounded in generalizability theory, which decomposes the overall variation in ratings according to the relative contribution of each of the design factors listed previously. To answer the second question, we take into account the interactional effects of our design facets on the variability of the ratings via multifaceted Rasch modeling. This represents a parametric latent-variable approach that goes beyond identifying the influence of individual design factors to statistically correct for potential biases in the resulting estimates of task and criteria difficulty, rater performance, and student proficiency. Utilizing descriptive statistics, g-theory analyses, and multifaceted Rasch model analyses en concerto helps to triangulate the empirical evidence for the rating quality and to illustrate the different kinds of inferences supported by each analytic approach.


Download 68 Kb.

Do'stlaringiz bilan baham:
1   ...   6   7   8   9   10   11   12   13   14




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©hozir.org 2025
ma'muriyatiga murojaat qiling

kiriting | ro'yxatdan o'tish
    Bosh sahifa
юртда тантана
Боғда битган
Бугун юртда
Эшитганлар жилманглар
Эшитмадим деманглар
битган бодомлар
Yangiariq tumani
qitish marakazi
Raqamli texnologiyalar
ilishida muhokamadan
tasdiqqa tavsiya
tavsiya etilgan
iqtisodiyot kafedrasi
steiermarkischen landesregierung
asarlaringizni yuboring
o'zingizning asarlaringizni
Iltimos faqat
faqat o'zingizning
steierm rkischen
landesregierung fachabteilung
rkischen landesregierung
hamshira loyihasi
loyihasi mavsum
faolyatining oqibatlari
asosiy adabiyotlar
fakulteti ahborot
ahborot havfsizligi
havfsizligi kafedrasi
fanidan bo’yicha
fakulteti iqtisodiyot
boshqaruv fakulteti
chiqarishda boshqaruv
ishlab chiqarishda
iqtisodiyot fakultet
multiservis tarmoqlari
fanidan asosiy
Uzbek fanidan
mavzulari potok
asosidagi multiservis
'aliyyil a'ziym
billahil 'aliyyil
illaa billahil
quvvata illaa
falah' deganida
Kompyuter savodxonligi
bo’yicha mustaqil
'alal falah'
Hayya 'alal
'alas soloh
Hayya 'alas
mavsum boyicha


yuklab olish