Variable Selection via Penalized Likelihood
Maximum likelihood ratio theory contributes tremendous success to parametric inferences, due to the fundamental theory of Wilks (1938). Yet, there is no general applicable approach for nonparametric inferences based on function estimation. Maximum likelihood ratio test statistics in general may not exist in nonparametric function estimation setting. Even if they exist, they are hard to find and can not be optimal as shown in this paper. In this paper, we introduce the sieve likelihood statistics to overcome the drawbacks of nonparametric maximum likelihood ratio statistics. New Wilks' phenomenon is unveiled. We demonstrate that the sieve likelihood statistics are asymptotically distribution free and follow X2-distributions under the null hypotheses for a number of useful hypotheses and a variety of useful models including Caussian white noise models, nonparametric regression models, varying coefficient models and generalized varying coefficient models. We further demonstrate that sieve likelihood ratio statistics are asymptotically optimal in the sense that they achieve optimal rates of convergence given by Ingster (1993). They can even be adaptively optimal in the sense of Spokoiny (1996) by using a simple choice of adaptive smoothing parameter. Our work indicates that the sieve likelihood ratio statistics are indeed general and powerful for nonparametric inferences based on function estimation.