Making statistical inference on high-dimensional data has been an interesting topic in recent days. To support this theme, this dissertation consists of three main components; (1) a new post-selection inference method, (2) group inference methods, and (3) a new R package.
First, a new method to construct confidence sets after lasso variable selection is developed, with strong numerical support for its accuracy and effectiveness.
A key component of my method is to sample from the conditional distribution of the response y given the lasso active set, which, in general, is very challenging due to the tiny probability of the conditioning event. This technical difficulty is overcome by using estimator augmentation to simulate from this conditional distribution via Markov chain Monte Carlo given any estimate $\tilde \mu$ of the mean $\mu_0$ of y. A randomization step for the estimate $\tilde \mu$ is then incorporated in my sampling procedure, which may be interpreted as simulating from a posterior predictive distribution by averaging over the uncertainty in $\mu_0$. My Monte Carlo samples offer great flexibility in the construction of confidence sets for multiple parameters.
Extensive numerical results show that my method is able to construct confidence sets with the desired coverage rate and, moreover, that the diameter and volume of my confidence sets are substantially smaller in comparison with a state-of-the-art method.
Second, the advantages of grouping variables are advocated by presenting extensive numerical results. The parametric bootstrap with refitted thresholded group lasso estimator is compared with competitor methods. Then applications of estimator augmentation in group lasso are introduced which includes importance sampler and de-biased parametric bootstrap. The importance sampler introduced is shown to outperforms sampling directly from the target distribution by several orders.
Lastly, an R package EAinference which stems from estimator augmentation methods is introduced. The package contains a parametric bootstrap, an importance sampler, Metropolis Hastings sampler and many related simulation-based inference tools. The package is available on CRAN.