Investigating the Effects of Genetic Variation on Transcriptional Regulation
- Author(s): Shen, Zeyang
- Advisor(s): Glass, Christopher
- et al.
Thousands of genetic variants have been found to increase disease risk based on genome-wide association studies. Many of these variants are located outside of protein-coding regions, suggesting their regulatory effects on gene transcription. However, it is not fully understood the effects of non-coding genetic variation on transcriptional regulation. One way of interpreting these variants is to link with the specific DNA sequences recognized by transcription factors (TFs), which are also called motifs. I developed MAGGIE, a bioinformatic approach to identify functional motifs that mediate TF binding and function. Unlike many other motif analysis tools, MAGGIE associates motif mutations caused by non-coding variants with the changes in TF binding or regulatory function to provide more direct insights into the regulatory effects of genetic variation. I showed the outstanding performance of MAGGIE in various applications, including its ability to distinguish the divergent functions of distinct NF-kB factors in pro-inflammatory macrophages. As a detailed case study of the effects of non-coding variants, I applied MAGGIE to identify functional motifs for anti-inflammatory macrophages and discovered dominant TFs driving the anti-inflammatory response, which are also the frequent targets of genetic variation to influence such response. In combination with an integrative analysis of transcriptomic and epigenomic data, I revealed quantitative variations in motif affinity underlying the divergent anti-inflammatory responses observed in genetically different mouse strains. By leveraging deep learning approaches, I pinpointed functional variants altering functional motifs and provided strong evidence supporting the promise of using deep learning to identify functional variants. Finally, I went beyond motifs to systematically analyze the spacing between motifs and investigated its significance in the context of variant interpretation. I found most collaborative TFs do not require a constrained spacing but allow a relaxed range of spacing in between. Based on synthetic genetic variations from mutagenesis experiments and millions of naturally occurring variations, I showed that spacing alterations are generally tolerated by TF binding and regulatory function at TF binding sites. Collectively, these findings advance our understanding of how non-coding genetic variation influences gene transcription and phenotypic diversity.