Automated generation of sentence reading fluency test items
Psychometric testing is a valuable educational tool for the assessment and monitoring of students’ abilities in core subjects. However, the manual development of these tests is a tedious process requiring test specialists to produce and curate large volumes of high-quality items. In this paper, we consider whether automating test item generation with modern machine learning methods is a feasible solution for obtaining strong psychometric test items at low cost in the domain of sentence reading fluency. We assess the ability of the large neural language model GPT-3 to produce items ``few-shot’’--- from a short prompt with only a handful of examples. Our results show that generated items closely resemble standardized test items in terms of their factual ambiguity, content appropriateness, and complexity. Furthermore, after filtering for correct answer-labeling these generated items possess similar latent psychometric properties to standardized test items, even capturing subtle grade-level variation.