- Main
Application of Higher-order IRT models and Hierarchical IRT models to Computerized Adaptive Testing
- Lee, Moonsoo
- Advisor(s): Cai, Li
Abstract
In recent years, the importance of formative assessments has been emphasized within educational measurement. This type of assessment often includes multiple correlated sub-domains and a hierarchical structure among the proficiencies. In this dissertation, several multidimensional CAT procedures are investigated to improve the measurement aspects of diagnostic testing and to better match the psychometric models to the test structure.
Five factors are manipulated with higher-order IRT models and hierarchical IRT models: (1) the different correlation conditions between two primary factors (low, medium, and high), (2) the number of group factors per primary factor (two and four), (3) the number of items (40, 80 and 160), (4) the item selection method (MFI and Bayesian), and (5) the proficiency score estimation method (MLE and EAP). Three outcome measures, including correlations between true and estimated proficiency scores, Root Mean Square Error (RMSE) of estimated proficiency scores, and Standard Errors (SE) are computed totaling 192 different conditions.
As expected, the correlation between true and estimated proficiency scores increase while RMSE and SE decrease when the test length correlation between two primary factors increase under different correlations among the factors, different item selection methods and different scoring methods. In overall, the higher-order IRT model CAT has an advantage over the hierarchical IRT model CAT when we need scores for the primary factors. On the other hand, if test designers are interested in more specific group factors, hierarchical IRT models outperformed the higher-order IRT models.
This study undertakes a comprehensive comparison of item selection methods and proficiency scores estimation in several multidimensional IRT models in conjunction with a CAT. The item selection and proficiency score estimation methods are negligible across the four multidimensional IRT CAT algorithms. However, the Bayesian item selection method has smaller RMSEs and SEs than the MFI method in specific cases and the EAP scoring method outperforms the MLE method, especially for short test length in this study.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-