Novel molecular and bioinformatics approaches to investigate DNA methylation in human epigenome
- Author(s): Diep, Dinh Hue;
- Advisor(s): Zhang, Kun;
- et al.
In recent years, advances in sequencing have enabled the mapping of DNA methylation variation across many different populations of human cell types and the identification of candidate DNA methylation biomarkers for clinical applications. The clinical development of DNA methylation biomarkers has been limited, however, due to the high cost and the lack of flexibility in using the current experimental and bioinformatics tools.
We made improvements to the design of bisulfite padlock probes (BSPP) to greatly increase efficiency and throughput for targeted DNA methylation quantification. The cost effectiveness and scalability of this approach was demonstrated on hundreds of samples using a set of 330,000 probes. We also developed a bioinformatics pipeline that performs SNP calling on bisulfite data and DNA methylation quantitation with reduced errors from various different assay types.
Despite many available bioinformatics tools for differential DNA methylation analysis, there is a need for a more general computational tool to characterize DNA methylation variability on reference data. Therefore, we developed a new differential methylation identification method and variability score to quantify DNA methylation variation across multiple groups of samples. For simulated 5X average depth of coverage datasets, cgDMR-miner, identified 42% of simulated DMRs with 73% precision while the next best approach identified 23% of simulated DMRs with 96% precision. Thus cgDMR-miner can identify potential targets from a shallow, low accuracy initial screen that can later be validated with a deeper screen using a targeted assay.
Lastly, the coordinated methylation of nearby CpG sites was investigated in order to identify more robust biomarkers for cancer. Starting with a set of identified 147,888 regions of tightly coupled CpG methylation or methylation haplotype blocks (MHBs), the linked status of CpGs within these regions were found useful for biomarker identification in human tissue samples and human cell free DNA.