Multiple Imputation of High-dimensional Mixed Incomplete Data
- Author(s): He, Ren
- Advisor(s): Belin, Thomas R
- et al.
It is common in applied research to have large numbers of variables with mixed data types (continuous, binary, ordinal or nomial) measures on a modest number of cases. Also, even a simple imputation model can be overparameterized when the number of variables is moderately
large. Finding a joint model to accommodate multivariate data with mixed data
types is challenging. Here we develop two joint multiple imputation models. One is using multivariate normal components for continuous variables and latent-normal components for categorical variables. Following the strategy of Boscardin and Weiss (2003) and using Parameter-expanded Metropolis-Hastings estimation (Boscardin,Zhang and Belin 2008), we
use a hierarchical prior for the covariance matrix centered around a parametric family. The second one is using a factor analysis model to impute missing items. It is an extension of Song and Belin (2004). The report is organized as follows: Chapter 1 gives a brief introduction of the research problem. Chapter 2 lists the review of the background knowledge related to our two new approaches. We introduce two existing methods of handling high-dimensional continuous incomplete data in Chapter 3 and another two methods of handling mixed incomplete data in Chapter 4. Our newly developed methods are outlined in Chapter 5. In Chapter 6, simulations under various conditions are carried out to compare the results based on our approaches with the results from the rounding method (Bernaards et al. 2007) as well as available-case analysis. In Chapter 7, our two approaches are applied to the California Health
Interview Survey (CHIS) 2009 data set. Several possible extensions and further directions of our methods are discussed in Chapter 8.