Some Contributions to Smoothing Spline Density Estimation
- Author(s): Shi, Jian
- Advisor(s): Wang, Yuedong
- et al.
Density estimation plays a fundamental role in many areas including statistics and machine learning. The estimated density functions are useful for model building and diagnostics, inference, prediction, classification and clustering. The goal of our research is to develop new methods for density estimation and inference. This dissertation consists of three projects involving smoothing spline density estimation and inference.
In the first project, we apply smoothing spline density estimation method to test for the normality under both univariate (Chapter 2) and multivariate (Chapter 3) settings. Using the fact that the null hypothesis is equivalent to the logistic density function belonging to the null space of a quintic spline, we construct new test statistics based on quintic polynomial spline and thin-plate spline estimates of the density function. We compare these new tests with some existing normality tests using simulations.
In the second project, we propose model-based penalties for smoothing spline density estimation and inference. These model-based penalties incorporate indefinite prior knowledge that the density is close to, but not necessarily in a family of distributions. The Pearson and generalization of the generalized inverse Gaussian families are used to illustrate the derivation of penalties and reproducing kernels. We also propose new inference procedures to test the hypothesis that the density belongs to a specific family of distributions.
Maximum likelihood estimation within a parametric family and nonparametric estimation are two traditional approaches for density estimation. Often in practice it is desirable to model some components of the density function parametrically while leaving other components unspecified. In the third project, we study a general semiparametric density model, which contains many existing semiparametric density models as special cases. We develop computational procedures for different cases, and study the theoretical properties including consistency and asymptotic distribution for the semiparametric linear case. Extensive simulations show that the proposed computational methods perform well and the semiparametric model can outperform many existing nonparametric and semiparametric density estimation methods. Real data applications are also provided.