Bayesian and Non-parametric Approaches to Missing Data Analysis
- Author(s): Yu, Yao
- Advisor(s): Li, Jun;
- Yu, Yaming
- et al.
Missing data occur frequently in surveys, clinical trials as well as other real data studies. In the analysis of incomplete data, one needs to correctly identify the missing mechanism and then adopt appropriate statistical procedures. Recently, the analysis of missing data has gained more and more attention. People start to investigate the missing data analysis in several different areas. This dissertation concerns two projects. First, we propose a Bayesian solution to data analysis with non-ignorable missingness. The other one is the non-parametric test of missing mechanism for incomplete multivariate data.
First, Bayesian methods are proposed to detect non-ignorable missing and eliminate potential bias in estimators when non-ignorable missing presents. Two hierarchical linear models, pattern mixture model and selection model, are applied to a real data example: the National Assessment of Education Progress (NAEP) education survey data. The results show that the Bayesian methods can correctly recognize the missingness mechanism and provide model-based estimators which can eliminate the possible bias due to non-ignorable missing. We also evaluate the goodness-of-fit of these two proposed models using two methods: the comparison of the real data with the predictive posterior distribution and the residual analysis by cross validation. A simulation study compares the performance of the two proposed Bayesian methods with the traditional design-based methods under different missing mechanisms and show the good properties of the Bayesian methods. Further, we discuss the three commonly used model selection criteria: the Bayes factor, the deviance information criterion and the minimum posterior predictive loss approach. Due to the complicated calculation of the Bayes factor and the uncertainty of the DIC, we conduct the last approach, which fails to correctly detect the real model structure for the hierarchical linear model.
Second, as an alternative to the fully specified model-based Bayesian method, a novel non-parametric test is proposed to detect the missing mechanism for multivariate missing data. The proposed test does not need any distributional assumptions and is proven to be consistent. A simulation study demonstrates that it has well controlled type I error and satisfactory power against a variety of alternative hypotheses.