Some Inference Problems in High-Dimensional Linear Models
- Author(s): Lopes, Miles Edward
- Advisor(s): Bickel, Peter J.
- et al.
During the past two decades, technological advances have led to a proliferation of high-dimensional problems in data analysis. The characteristic feature of such problems is that they involve large numbers of unknown parameters and relatively few observations. As the study of high-dimensional statistical models has developed, linear models have taken on a special status for their widespread application and extensive theory. Even so, much of the theoretical research on high-dimensional linear models has been concentrated on the problems of prediction and estimation, and many inferential questions regarding hypothesis tests and confidence intervals remain open.
In this dissertation, we explore two sets of inferential questions arising in high-dimensional linear models. The first set deals with the residual bootstrap (RB) method and the distributional approximation of regression contrasts. The second set addresses the issue of unknown sparsity in the signal processing framework of compressed sensing. Although these topics involve distinct methods and applications, the dissertation is unified by an overall focus on the interplay between model structure and inference. Specifically, our work is motivated by an interest in using inferential methods to confirm the existence of model structure, and in developing new inferential methods that have minimal reliance on structural assumptions.
The residual bootstrap method is a general approach to approximating the sampling distribution of statistics derived from estimated regression coefficients. When the number of regression coefficients p is small compared to the number of observations n, classical results show that RB consistently approximates the laws of contrasts obtained from least-squares coefficients. However, when p/n~1, it is known that there exist contrasts for which RB fails --- when applied to least-squares residuals. As a remedy, we propose an alternative method that is tailored to regression models involving near low-rank design matrices. In this situation, we prove that resampling the residuals of a ridge regression estimator can alleviate some of the problems that occur for least-squares residuals. Notably, our approach does not depend on sparsity in the true regression coefficients. Furthermore, the assumption of a near low-rank design is one that is satisfied in many applications and can be inspected directly in practice.
In the second portion of the dissertation, we turn our attention to the subject of compressed sensing, which deals with the recovery of sparse high-dimensional signals from a limited number of linear measurements. Although the theory of compressed sensing offers strong recovery guarantees, many of its basic results depend on prior knowledge of the signal's sparsity level --- a parameter that is rarely known in practice. Towards a resolution of this issue, we introduce a generalized family of sparsity parameters that can be estimated in a way that is free of structural assumptions. We show that our estimator is ratio-consistent with a dimension-free rate of convergence, and also derive the estimator's limiting distribution. In turn, these results make it possible to set confidence intervals for the sparsity level and to test the hypothesis of sparsity in a precise sense.