Sparse modeling has become a particularly important and quickly developing topic in many applications of statistics, machine learning, and signal processing. The main objective of sparse modeling is discovering a small number of predictive patterns that would improve our understanding of the data. This paper extends the idea of sparse modeling to the variable selection problem in high dimensional linear regression, where there are multiple response vectors, and they share the same or similar subsets of predictor variables to be selected from a large set of candidate variables. In the literature, this problem is called multi-task learning, support union recovery or simultaneous sparse coding in different contexts.
We present a Bayesian method for solving this problem by introducing two nested sets of binary indicator variables. In the first set of indicator variables, each indicator is associated with a predictor variable or a regressor, indicating whether this variable is active for any of the response vectors. In the second set of indicator variables, each indicator is associated with both a predicator variable and a response vector, indicating whether this variable is active for the particular response vector. The problem of variable selection is solved by sampling from the posterior distributions of the two sets of indicator variables. We develop a Gibbs sampling algorithm for posterior sampling and use the generated samples to identify active support both in shared and individual level. Theoretical and simulation justification are performed in the paper.
The proposed algorithm is also demonstrated on the real image data sets. To learn the patterns of the object in images, we treat images as the different tasks. Through combining images with the object in the same category, we cannot only learn the shared patterns efficiently but also get individual sketch of each image.