Sequential Multiple Testing for Variable Selection in High Dimensional Linear Model
Covariance test is proposed for testing the significance of the predictor variable that enters the current lasso model along the lasso solution path. In this paper, we propose the sequential multiple testing structure using covariance test p-values, which has good power properties with error rate controlled at a desired level. Specifically, we consider the full underlying hypotheses and the error rate control within each step as well as across all steps along the lasso solution path.
Our sequential multiple hypothesis structure becomes valid because of the asymptotic distribution of the covariance test. And we prove that the minimum composite p-values under null hypothesis get larger along the steps, which is desirable when we apply step-down procedure for error rate control. The benefit of making use of the multivariate structure is, for some scenarios, to increase power while other procedures stop the selection too early. Also, our Hybrid procedures show higher power through simulations for weak signals and for high-dimensional data.
To control FWER, we propose Hybrid Bonferroni-Holm Step-Down procedure along with Hybrid Hochberg-Holm and Hybrid Simes-Holm Step-Down procedures and compare them with StrongStop. To control FDR, we propose Hybrid Bonferroni-Benjamini-Liu Step-Down procedure and compare it with the ForwardStop, StrongStop, TailStop procedures and SLOPE . Simulation studies show that our proposed procedures have higher power with both FWER and FDR controlled at the desired level, especially for large scale and high-dimensional data, and very stable to use as well as in correlated design matrix.
In our work, we first review the variable selection in statistics and most popular used variable selection methods. One of the most widely used and well developed method lasso is introduced in Chapter 2. The covariance test proposed for the significance test for lasso will be given in Chapter 3 , as well as its properties for orthogonal matrix X, and the asymptotic distribution. We then propose our sequential hypotheses multiple testing structure built on the lasso covariance test in Chapter 4. In Chapter 5, we develop the Hybrid Bonferroni-Holm Step-Down procedure to control FWER at alpha, and compare with StrongStop. In Chapter 6, we propose Hybrid Bonferroni-Benjamini-Liu Step-Down method and prove the FDR can be controlled at q. We also compare it with ForwardStop, StrongStop, TailStop and SLOPE through simulation studies. In Chapter 7, we show two applications of our proposed procedure on a diabetes data and a framingham heart study data and compare it with the other procedures. The conclusion and discussion are given in the last. The proofs for the main theorems and lemmas are included in the appendix.