Item cluster-based assessment: Modeling and design
This three-paper dissertation explores item cluster-based assessments, first in general as it relates to modeling, and then, specific issues surrounding a particular item cluster-based assessment designed
There should be a reasonable analogy between the structure of a psychometric model and the cognitive theory that the assessment is based upon. Specifically, for item response theory (IRT) models in educational assessment scores, the structure of dependencies among items that are designed as item clusters (groups of items that share common stimulus material, etc) should be reflected in the model. This type of designed local item dependence (LID) can be modeled in many different ways. The literature on the existence of LID and models developed to account for this LID is somewhat extensive, though there is little work to unify and organize these different approaches. The first paper presents a general framework to guide modeling decisions for item cluster-based assessments by first formalizing some of the terminology used in the context of LID, providing an overview of methods for detecting LID, and discussing general modeling approaches for response data that is theorized to exhibit LID.
Recent pushes for increased rigor and focus on complex constructs (such as critical thinking) in K-16 education highlight a need to develop assessments that measure these complex constructs. The second paper explores these issues in the context of a particular complex constructs in statistics education, that of Linking Data to a Claim (LDC), Meta-Representation Competence (MRC), and Formal Inference (FoI). We present a multidimensional treatment and analysis of field test data for the Critical Reasoning for College-Readiness (CR4CR) Assessment, an item cluster-based assessment. We found that the LDC and FoI items as written can provide a mapping of student ability estimates to the construct map levels as defined, but that the MRC items do not. Further, as expected, we found moderately strong correlations among the three constructs.
The third paper describes the design of selected response items based on open-ended counterparts for the CR4CR Assessment, and the empirical comparisons of these different formats. It is commonly thought that multiple choice (or selected response) items on tests do not provide useful information to educators regarding higher level thinking skills such as argumentation or critical thinking. However, there is also a need for diagnostic assessments to provide educators with timely feedback on student performance so that instruction can be adapted or interventions administered based upon student needs. We found that though existing literature suggests that selected response item types are easier, in general, than constructed response item types, this may not be the case for all constructs. We found that, for the LDC and FoI constructs, multi-select multiple choice items behaved similarly to their constructed response counterparts.