In multicellular organisms, there are many different cell types possessing the same genetic information, each performing a particular role. Many players, such as transcription factors, nucleosome positioning, histone post-translational modifications and non-coding RNAs, contribute to regulate the expression of specific genes, defining the specific cell state. Once the expression pattern is established during development, it is faithfully maintained throughout the life of an organism. In many organisms, a key component of this epigenetic regulation is a covalent modification of cytosines (5meC) in the genome, known as DNA methylation. DNA methylation is associated with repression of transcription if present in gene promoters and it can suppress the transcription of aberrant intragenic transcripts. The focus of my doctoral research has been the development of methods to map the distribution of 5meC genome-wide and understand how DNA methyltransferases are recruited to their targets. Since mammalian genomes are large, current approaches to map the methylation of the entire genome are expensive. Several methods have been developed to assess the methylation status of part of the genome. Some of them are based on enrichment probes, others on enzymatic digestion. Chapters 1 and 2 are based on methods that assess the methylation status of part of the genome. Reduced-Representation Bisulfite Sequencing (RRBS) captures the majority of CpG islands and promoters. Since only 1% of the genome is assessed with this technique, costs associated with sequencing are dramatically reduced. In chapter 1, this technique is used to discover methylation levels at specific CpG sites associated with complex disease traits. Most of the CpG sites in the genome are methylated and do not have variable methylation levels between different cell types, suggesting that some of the fragments isolated by RRBS do not provide useful information. In order to overcome this limitation, we improved an existing method, Methylation-sensitive Restriction Enzyme-seq (MRE-seq), which enriches for regions poorly methylated (approximately 20% of the genome). This method, called MRE-BS (MRE-Bisulfite Sequencing), is described in chapter 2. The costs are similar to RRBS, but the development of a multiple regression model has allowed us to estimate differential methylation between two samples across 60% of the genome.
Chapter 3 focuses on how de novo DNA methyltransferases are recruited to their target sites. This work has shown that the murine DNMT3b is guided by histone post-translational modifications, both in yeast and primordial germ-cells. In general, DNMT3b is absent from regions marked by H3K4me3 and it is recruited in gene-bodies by H3K36me3.
The last chapter focuses on the presence of 5meC in messenger RNA, the function of which is still unknown. We discovered several hundred putative methylation sites that are associated with predicted secondary structures in mRNAs. This finding might be explained either by the recruitment of RNA methyltransferases (RMTs) by structural motives or by the limitations of the method utilized. More experiments are needed to understand what signal is needed for the specific recruitment of RMTs to their target sites and what is the function of this mark.