Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Application of Higher-order IRT models and Hierarchical IRT models to Computerized Adaptive Testing

Abstract

In recent years, the importance of formative assessments has been emphasized within educational measurement. This type of assessment often includes multiple correlated sub-domains and a hierarchical structure among the proficiencies. In this dissertation, several multidimensional CAT procedures are investigated to improve the measurement aspects of diagnostic testing and to better match the psychometric models to the test structure.

Five factors are manipulated with higher-order IRT models and hierarchical IRT models: (1) the different correlation conditions between two primary factors (low, medium, and high), (2) the number of group factors per primary factor (two and four), (3) the number of items (40, 80 and 160), (4) the item selection method (MFI and Bayesian), and (5) the proficiency score estimation method (MLE and EAP). Three outcome measures, including correlations between true and estimated proficiency scores, Root Mean Square Error (RMSE) of estimated proficiency scores, and Standard Errors (SE) are computed totaling 192 different conditions.

As expected, the correlation between true and estimated proficiency scores increase while RMSE and SE decrease when the test length correlation between two primary factors increase under different correlations among the factors, different item selection methods and different scoring methods. In overall, the higher-order IRT model CAT has an advantage over the hierarchical IRT model CAT when we need scores for the primary factors. On the other hand, if test designers are interested in more specific group factors, hierarchical IRT models outperformed the higher-order IRT models.

This study undertakes a comprehensive comparison of item selection methods and proficiency scores estimation in several multidimensional IRT models in conjunction with a CAT. The item selection and proficiency score estimation methods are negligible across the four multidimensional IRT CAT algorithms. However, the Bayesian item selection method has smaller RMSEs and SEs than the MFI method in specific cases and the EAP scoring method outperforms the MLE method, especially for short test length in this study.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View