Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Regression with complex data: regularization, prediction and bootstrap

Abstract

Analyzing a linear model is a fundamental topic in statistical inference and has been well-studied. However, the complex nature of modern data brings new challenges to statisticians, i.e., the existing theories and methods may fail to provide consistent results. Focusing on a high dimensional linear model with i.i.d. errors or heteroskedastic and dependent errors, this dissertation introduces a new ridge regression method called `the debiased and thresholded ridge regression'; then adopts this method to fit the linear model. After that, it introduces new bootstrap algorithms and applies them to generate consistent simultaneous confidence intervals/performs hypothesis testing for linear combinations of parameters in the linear model. In addition, this paper applies bootstrap algorithm to construct the simultaneous prediction intervals for future observations. Numerical algorithms show that the new ridge regression method has a good performance compared to other complex methods like Lasso or the threshold Lasso.

This thesis also studies the properties of a residual-based bootstrap prediction interval. It derives the asymptotic distribution of the difference between {the conditional coverage probability of a nominal prediction interval} and {the conditional coverage probability of a prediction interval obtained via a residual-based bootstrap}. This result shows that the residual-based bootstrap prediction interval has about 50% possibility of yielding conditional under-coverage. Moreover, it introduces a new bootstrap prediction interval that has the desired asymptotic conditional coverage probability and the possibility of conditional under-coverage.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View