What Your Genome Doesn't Tell You : : Population-level and Genome-wide Epigenomic Surveys of Plants and People
- Author(s): Schultz, Matthew Douglas
- et al.
Over 70 years ago, Conrad Waddington coined the word epigenetics to describe the heritable phenomena that lie between genotype and phenotype; although, he knew not what they were. In the intervening years, much has been learned about the potential molecular mechanisms that could fill this role and none are as well-positioned as DNA methylation, the addition of a methyl group to cytosine nucleotides in DNA. This modification has a well-known mechanism by which it can be maintained during DNA replication and is known to associate with certain transcriptional states thus providing a plausible way it can influence the activity of genes as well as a way to be passed along to daughter cells without changing the genome. A key question in the field of epigenetics is if changes in epigenetic signals can arise in the absence of genotypic changes across generations. To address this problem, we applied these methodologies to a population of the model plant Arabidopsis thaliana that are genotypically identical, but have been self-propagated by single-seed descent for many generations. These plants allowed us to determine the rate and properties of epigenetic changes while holding genotype constant. We next examined the variability of DNA methylation across a population of plants collected from around the world. Although in this instance we could not control for genotype, given the number of samples we obtained, we were able to estimate the fraction of DNA methylation variability that is attributable to genotype, further clarifying the amount of genotype-independent DNA methylation variation. Finally, we investigate the DNA methylomes of a variety of human tissues and uncover DNA methylation differences outside of typical contexts. Throughout this work, we address these key questions by developing methodologies to analyze MethylC-seq data, an assay which utilizes high-throughput sequencing to measure methylation states, as well as statistical procedures to find differences among samples. These data are typically hundreds of gigabytes in scale and require efficient algorithms to process them. Furthermore, the base- resolution nature of these data necessitates statistical procedures that take advantage of these measurements