Many neurobehavioral traits are highly heritable, yet specific genes underlying them are difficult to identify. Copy number variants – large, often multi-gene, deletions and duplications - are one category of mutation that drives traits including autism spectrum disorder (ASD), bipolar disorder, schizophrenia, obesity, and intellectual disability. A 600kb 30-gene region on chromosome 16p11.2, for example, is strongly associated with ASD when duplicated or deleted, while deletions of a 3mb 60-gene region on chromosome 22q11.2 are a well-known genetic cause of schizophrenia. Under the assumption that a subset of these 30 or 60 genes contribute to associated traits, either individually or in combination, we investigate the genic architecture of CNV association with neurobehavioral traits.
We think about CNVs’ impact on traits in the context of gene expression: individuals that have genetic duplications have increased gene expression (1.5x of normal), while those with deletions have decreased gene expression (0.5x of normal). Among non-carriers, expression levels vary as well, although typically to a smaller extent. This gives rise to a hypothesis: expression variation of the genes in a CNV region may be associated with the CNV-associated (or related) traits in non-carriers. Under this hypothesis, we can separate out the association of the expression of specific CNV genes with specific traits, which cannot be done in a CNV carrier where all genes are upregulated or downregulated.
We studied large populations of non-carriers with genetic data (and corresponding controls, where applicable) for five neurobehavioral traits: ASD, bipolar disorder, schizophrenia, BMI (as a proxy for obesity), and IQ (as a proxy for intellectual disability). We took an expression prediction approach which allowed us to convert GWAS-style data into imputed expression-level data which can be used for association analyses. We studied the association of individual CNV genes, pairs of CNV genes, and all genes in the region to these five traits. This study design was also used to assess association between CNV gene expression variation and clinical traits within a large biobank with genotypic and clinical information.
The second chapter of this dissertation focuses on individual genes within GWAS datasets and clinical biobanks, and the third focuses on the extension of this approach to combinations of genes and the entire region. In support of our hypothesis, we were able to detect individual genes at 16p11.2 associated with neurobehavioral traits, most notably INO80E, significantly associated with schizophrenia and BMI, and nominally associated with bipolar disorder. Using the biobank, we found additional genes associated with related clinical traits including psychosis and mood disorders, with an overall over-representation of mental disorders in CNV gene associated phenotypes. We then found that variance in traits was better explained by pairs of CNV genes in nearly all instances, including those where we had identified single-gene associations. The regionwide prediction was associated with BMI and IQ at both 16p11.2 and 22q11.2, but not with any neuropsychiatric trait. The importance of the combinatorial contributions of genes did not extend to matched control regions for the same traits. In sum, our studies provide insight into the transcription-based action of CNV genes, identify potential candidate genes for further study, describe combinatorial patterns of CNV gene impacts on neurobehavior, and demonstrate the utility of integrating genetic, clinical, and transcriptomic data for in silico analyses.