Simulation-based Significance Test for Lasso-Type Problems
Lasso has been shown to be effective in variable selection and sparse modeling. It can be applied to select a parsimonious set for the efficient prediction of a response variable. The goal of the thesis is to do significance test to know whether all truly active variables are contained in the current lasso model.
We design the following test statistics to do the test: T1 = L1-norm of the coefficients, T2 = maximum absolute value of the coefficients, and T3 = covariance test statistic. Simulation based method, like direct sampling and importance sampling, are applied to draw samples and calculate the first two test statistics. The third statistic, covariance test statistic, is constructed based on lasso fitted values. Its null distribution is tractable and asymptotically Exp(1). Power curves of T1 and T2 are slightly different. Another test aims to test the significance of the predictor variable in the sequence of models visited along the lasso solution path. All of the three statistics are effective to select truly active variable; however, in terms of efficiency, T3 is a better choice.