Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Deep learning predicts the impact of non-coding genetic variants in human traits and diseases

Abstract

In the human genome, the vast majority of DNA is non-coding. Although non-coding DNA does not directly encode protein sequences, they are vital to the transcriptional regulation of the protein-coding process. Recent genome-wide association studies (GWAS) have shown that ~93% of genetic variants driving common human traits and diseases lie within non-coding sequences. However, due to the complicated and indirect functions of these non-coding genetic variants, it is difficult for traditional analysis metrics to sift through the large number of non-coding sequences and pinpoint the variants casual to human diseases and traits.

In this dissertation, I present AgentBind, a deep learning framework that identifies and interprets sequence features most predictive of regulatory activities, such as transcription factor binding, histone modification, and chromatin accessibility. I demonstrate that AgentBind is applicable to diverse types of biological tasks, including (1) pinpointing sequence features most important for transcription factor binding; (2) prioritizing genetic variants in transcriptional enhancers associated with human brain disorders; and (3) identifying the dominant combinations of lineage-determining and signal-dependent transcription factors driving enhancer activation in mice. Collectively, these studies provide a valuable deep learning framework and its use cases in decoding the rules within non-coding regulatory regions and identifying specific non-coding nucleotides with the strongest effects on human traits and diseases.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View