Covariance matrix estimation and variable selection in high dimension
First part of the thesis focuses on sparse covariance matrices estimation under the scenario of large dimension p and small sample size n. In particular, we consider a class of covariance matrices which are approximately block diagonal under unknown permutations. We propose a block recovery estimator and show it achieves minimax optimal convergence rate for the class, which is the same as if the permutation were known. The problem is also related to sparse PCA and k-densest subgraphs, where the spike model is a special case of their intersection. Simulations of the spike model and multiple block model, together with a real world application, confirm that the proposed estimator is both statistically and computationally efficient.
Second part of the thesis focuses on variable selection in linear regression, also under the high dimensional scenario of large p and small n. We propose a general framework to search variables based on their covariance structures, with a specific variable selection algorithm called kForward which iteratively fits local/small linear models among relatively highly correlated variables. For simulation experiments and a real world data set, we compare kForward to other popular methods including the Lasso, Elastic Net, SCAD, MC+, FoBa for both variable selection and prediction.