DNA methylation plays crucial roles in many biological processes and abnormal DNA methylation patterns are often observed in diseases. Recent studies have shed light on cis-acting DNA elements that regulate locus-specific DNA methylation. More importantly, these new discoveries have shown potentials in clinical application.
In this thesis, I first interrogate the current biological foundation for the cis-acting genetic code that regulates DNA methylation. This process involves transcription factors, histone modifications, and DNA secondary structure. In chapter 2, we demonstrate how to find the functional motifs that regulate DNA methylation. We have analyzed 34 diverse whole-genome bisulfite sequencing datasets and have identified 313 identified motifs, including 92 and 221 associated with methylation (methylation motifs, MMs) and unmethylation (unmethylation motifs, UMs), respectively. We show that these motifs are associated with local methylation level, and motif disruption of by mutation leads to significantly altered methylation level of the CpGs in the neighbor regions. Combined with somatic mutations, these motifs improve the prediction of cancer subtypes and patient survival.
DNA motif analysis frequently requires intuitive understanding and convenient representation of motifs. In chapter 3, I review how the motifs are typically represented as position weight matrices (PWMs) and propose a new wildcard-style consensus sequence representation based on mutual information theory and Jenson-Shannon Divergence. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized alphabets. On the other hand, experimental validation of cis-acting DNA elements benefits from the recent advancement of CRISPR/Cas9 mediated genetic screening. In chapter 4, I present CRISPY, a lightweight, robust CRISPR screening pipeline that unifies single-sgRNA and CREST-seq screening protocols and is capable of profiling peak candidates with existing data of histone modifications, DHS, and ATAC-seq in human and mouse.
Combined together, our studies have provided new insights on how genetic code regulates DNA methylation and can be applied to clinical applications. In addition, we provide the tools to efficiently represent the motifs and evaluate their functions in a high-throughput manner.