Neurogenomics in the mouse model : multivariate statistical methods and analyses
- Author(s): Zapala, Matthew Alan
- et al.
The use of high-throughput genomic technologies has led to significant advances in the study of the molecular anatomy of the mammalian brain and the creation of the field known as neurogenomics. Microarray gene expression data has been utilized to identify genes associated with specific brain functions, behaviors and disease-related phenotypes. These gene expression datasets have shed light on the molecular organization of both the developing and adult mammalian brain, specifically in the mouse model. To investigate the molecular organization of the adult mammalian brain, a gene expression-based brain map was built. Gene expression patterns for 24 neural tissues covering the mouse central nervous system were measured and it was found, surprisingly, that the adult brain bears a transcriptional "imprint" consistent with both embryological origins and classic evolutionary relationships. Beyond simply analyzing gene expression patterns within the brain, it is now possible to analyze genomic sequence data, such as single nucleotide polymorphisms (SNPs), in parallel with large scale gene expression data in what has been called a genetical genomics approach to determine transcriptional regulatory networks. To further analyze the molecular organization of the mouse brain, we analyzed gene expression profiles of five brain regions from six inbred mouse strains and integrated these findings with SNP data available for the individual strains. We found that many transcriptional regulatory networks are highly specific to particular brain regions. The ability to query the rich, complementary data sources of gene expression and SNPs together offers tremendous inroads to start to unravel the genetic determinates of complex polygenic diseases and phenotypes. However, appropriate data analysis strategies must be developed that can accommodate the complexity and high-dimensional aspects of these disparate data sources. In order to address some of these analysis issues, we developed an algorithm to identify sequence variation in gene expression data which can artificially affect expression signals and lead to false positive results. We also expanded a new statistical technique termed multivariate distance matrix regression that tests the association of multivariate profiles arising from high- dimensional data sets common in neurogenomics. The body of work presented herein attempts to assimilate the distinct fields of neuroanatomy, genomics, bioinformatics, statistical genetics and biostatistics to create novel analysis tools and develop new insights into biological processes related to neurogenomics