This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1) explanatory, trying to explain (or "diagnose") results regarding students or test items using features of the students and items themselves; (2) longitudinal, modeling change using responses from multiple assessments over time; and (3) multilevel, accounting for higher-level groupings such as classrooms or schools. The papers in this dissertation lie at the intersection of these three areas. Each paper develops a specific statistical or psychometric method with application to educational research.
The first paper proposes and assesses a method for (secondary) data analysis when the outcome variable in a multilevel model is latent and therefore measured with error. The goal is a method that is convenient and readily understandable for applied research. The best current approach for this type of analysis is plausible values methodology, which relies on having a latent regression model for the construction of the plausible values that matches the intended secondary analysis. In current practice, plausible values are constructed with a single-level regression as the conditioning model, which leads to biased estimates of the variance components when the secondary analysis uses a multilevel model. The method proposed in this paper uses weighted likelihood estimates (WLEs) of the latent variable, which do not rely on the specification of a conditioning model, as the dependent variable for the multilevel model. It explicitly accounts for measurement error in the WLEs by fixing part of the level-1 residual variance equal to the estimated variance of the WLEs. The performance of the proposed method is evaluated and compared to the plausible values method using simulation studies and an empirical example.
The second paper proposes extensions to existing item response models for the purpose of evaluating educational interventions. The proposed models incorporate information about the design of the intervention in order to obtain more nuanced information regarding the efficacy of an intervention from the assessment data. The models combine longitudinal growth on the person side, which provides information about overall efficacy, with group- and time-varying item feature effects, which provide information about factors that may contribute to differences in growth over time. The proposed models are applied to empirical data from a new lesson sequence for elementary school mathematics. Particular attention is paid to issues of interpretation, item feature design and quality, and measurement invariance.
The third paper proposes a longitudinal item response model for differential growth based on initial status. The model was designed to answer research questions regarding for whom an instructional sequence or educational program is more (or less) effective and whether the instruction or program is expected to narrow or widen an existing achievement gap. The proposed model encompasses different conceptions of initial status; these conceptions can be examined simultaneously to uncover whether growth is predicted by factors common across the assessments or by factors specific to the assessment at the initial time. The identification and estimation of the proposed model and equivalent models are discussed; parameter recovery is assessed via simulation. The use and interpretation is illustrated with empirical data.