In item response theory (IRT), the underlying latent variables are typically assumed to be normally distributed. If the assumption of normality is violated, the item and person parameter estimates can become biased. Therefore, it is necessary in practical data analysis situations to examine the adequacy of this assumption in an effective manner. There is a recent surge of interest in limited-information overall goodness-of-fit test statistics for IRT models (see e.g., Cai, Maydeu-Olivares, Coffman, & Thissen, 2006; Joe & Maydeu-Olivares, 2010; Cai & Hansen, 2013), but their appropriateness for diagnosing latent variable distributional fit has not been studied.
The approach undertaken in this research is to use summed score likelihood based indices. The idea itself is not new (see e.g., Ferrando & Lorenzo-Seva, 2001; Hambleton & Traub, 1973; Lord, 1953; Ross, 1966; Sinharay, Johnson, & Stern, 2006; Thissen & Wainer, 2001), but this study recasts the problem using the framework of limited-information goodness of fit testing. The summed score based indices can be viewed as a particular form of reduction of the full underlying multinomial that are potentially sensitive to the latent variable distributional misspecifications.
Results from a pilot study (Li & Cai, 2012) show that summed score likelihood based indices enjoy high statistical power for detecting latent variable distributional assumption violations, and are not sensitive (correctly) to other forms of model misspecification such as unmodeled multidimensionality. Meanwhile, the limited-information overall fit statistic M2 (Maydeu-Olivares & Joe, 2005) has relatively low power against latent variable non-normality. However, technically the statistical indices proposed by Li and Cai (2012) don't follow an exactly chi-squared distribution. They proposed a heuristic degrees of freedom adjustment, but more rigorous justifications could be developed along the lines of Sattora-Bentler type moment adjustment popular in structural equation modeling (Satorra & Bentler, 1994). In IRT, the moment adjustment approaches have been used by Cai et al. (2006) and Maydeu-Olivares (2001).
The major methodological contributions of my dissertation come from simulation studies that examine the calibration and power of the moment adjusted test statistics across various conditions: number of items, sample size, item type, generating latent variable distribution, and the values of generating item parameters. The performance of these fit statistics are also compared with M2. Simulation study results show that the proposed moment-adjusted statistics improve upon the unadjusted statistics in the null and alternative conditions, especially when generating item parameters are dispersed. Finally, performance of the indices is illustrated with empirical data from educational and psychological assessment development projects.