Automated Essay Scoring in Innovative Assessments of Writing from Sources
This study examined automated essay scoring for experimental tests of writing from sources. These tests (part of the CBAL research initiative at ETS) embed writing tasks within a scenario in which students read and respond to sources. Two large-scale pilots are reported: One was administered in 2009, in which four writing assessments were piloted, and one was administered in 2011, in which two writing assessments and two reading assessments were administered. Two different rubrics were applied by human raters to each prompt: a general rubric intended to measure only those skills for which automated essay scoring provides relatively direct measurement, and a genre-specific rubric focusing on specific skills such as argumentation and literary analysis. An automated scoring engine (e-rater) was trained on part of the 2009 dataset, and cross-validated against the remaining 2009 dataset and all the 2011 data. The results indicated that automated scoring can achieve operationally acceptable levels of accuracy in this context. However, differentiation between the general rubric and the genre-specific rubric reinforces the need to achieve full construct coverage by supplementing automated scoring with additional sources of evidence.