UC Santa Cruz
On Bayesian Methods in Network Regression
- Author(s): Guha, Sharmistha
- Advisor(s): Rodriguez, Abel
- et al.
There has been a growing interest during recent years in connectomics, which is the study of interconnections or networks within the human brain. This interest has been spurred by the development of new imaging technologies, which allow researchers to peer non-invasively into the human brain and obtain data on connections. Motivated by these datasets, this dissertation develops a novel class of Bayesian regression models which study the relationships between neuro-scientific phenotypes and brain connectome networks of individuals.
First, we introduce a novel approach that develops a regression framework of the brain network (represented in the form of a symmetric matrix) on a continuous phenotypic response. We propose a novel network shrinkage prior on the network predictor coefficient matrix. The proposed framework is able to identify nodes or functional regions in the brain network and interconnections between different regions, significantly related to the phenotypic response. To the best of our knowledge, our framework is the first principled Bayesian framework that enables identification of network nodes and edges significantly related
to the response. The performance of the proposed model is evaluated with respect to a wide range of existing competitors available in the high dimensional frequentist and Bayesian literature using a variety of simulation studies. The proposed model identifies important brain regions and interconnections significantly associated with creativity for a group of subjects.
Next, we extend our model to build network classifiers when a brain connectome network along with a binary response is provided for a group of individuals. Here we develop a broader class of global-local network shrinkage priors which includes the novel prior distribution specified earlier as a special case. We specifically consider two different global-local network shrinkage priors from this class of priors and investigate them using simulation studies. In particular, we assess their performance in terms of network classification and identifying influential network nodes and edges for the purpose of classification. We also demonstrate superior performance of our proposed network classifiers over state-of-the-art high dimensional classification techniques. Another major contribution remains developing theoretical conditions to guarantee asymptotically consistent classification for the proposed framework. In particular, we derive conditions on the number of network nodes, sparsity in the network coefficient matrix as a function of the sample size to achieve asymptotically optimal classification. While theoretical results on high dimensional binary regression with ordinary shrinkage priors have emerged recently, developing theory for our network classifier model involves several additional challenges due to the complex nature of the global local shrinkage prior developed here. The framework is used to classify individuals into high and low IQ groups based on their brain connectomes.
Notably, the work discussed in the last two paragraphs tacitly assumes that all nodes and edges have similar impact on a phenotype for every individual. In our next project, we study a brain connectome data where this assumption is violated. In fact, there is a relatively less developed literature in neuroscience that argues for different groups of individuals having shared relationships between brain networks and phenotypes, though this literature lacks a principled Bayesian approach that takes into account different relationships of nodes and edges with the response for different groups of individuals and facilitates clustering of individuals. Motivated by this problem and our dataset, we have developed a Bayesian network mixture regression model. Simulation studies and analysis of the brain connectome dataset demonstrate superior performance of the proposed approach over the approach described earlier. Simulation studies are also used to evaluate the performance of the proposed approach by varying the true and fitted number of clusters, size of the network and sample size.
For these projects, computationally efficient Bayesian sampling algorithms are developed to enable computations even for reasonably large networks in presence of moderately large sample size.