This three-paper dissertation explores problems with the use of standardized tests as outcome measures for the evaluation of instructional interventions in mathematics and science. Investigators commonly use students’ scores on standardized tests to evaluate the impact of instructional programs designed to improve student achievement. However, evidence suggests that the standardized tests may not measure, or may not measure well, the student learning caused by the interventions. This problem is special case of a basic problem in applied measurement related to understanding whether a particular test provides accurate and useful information about the impact of an educational intervention. The three papers explore different aspects of the issue and highlight the potential benefits of (a) using particular research methods and of (b) implementing changes to educational policy that would strengthen efforts to reform instructional intervention in mathematics and science.
The first paper investigates measurement problems related to the use of standardized tests in applied educational research. Analysis of the research projects funded by the Institute of Education Sciences (IES) Mathematics and Science Education Program permitted me to address three main research questions. One, how often are standardized tests used to evaluate new educational interventions? Two, do the tests appear to measure the same thing that the intervention teaches? Three, do investigators establish validity evidence for the specific uses of the test? The research documents potential problems and actual problems related to the use of standardized tests in leading applied research, and suggests changes to policy that would address measurement issues and improve the rigor of applied educational research.
The second paper explores the practical consequences of misalignment between an outcome measure and an educational intervention in the context of summative evaluation. Simulated evaluation data and a psychometric model of alignment grounded in item response modeling generate the results that address the following research question: how do differences between what a test measures and what an intervention teaches influence the results of an evaluation? The simulation derives a functional relationship between alignment, defined as the match between the test and the intervention, and treatment sensitivity, defined as the statistical power for detecting the impact of an intervention. The paper presents a new model of the effect of misalignment on the results of an evaluation and recommendations for outcome measure selection.
The third paper documents the educational effectiveness of the Learning Mathematics through Representations (LMR) lesson sequence for students classified as English Learners (ELs). LMR is a research-based curricular unit designed to support upper elementary students’ understandings of integers and fractions, areas considered foundational for the development of higher mathematics. The experimental evaluation contains a multilevel analysis of achievement data from two assessments: a standardized test and a researcher-developed assessment. The study coordinates the two sources of research data with a theoretical mechanism of action in order to rigorously document the effectiveness and educational equity of LMR for ELs using multiple sources of information.