Sequencing provides a way to read out the molecular information carrier that unifies all life: DNA. The ability to access this cellular information has reinvented how biological systems are quantified and studied, as high-throughput sequencing can provide genome-wide and single-base resolution snapshots of biological activity in virtually any cell type. As sequencing technology has decreased in cost and become more accessible, it has also become the foundation for a growing variety of applications beyond recording genomic identity, and has been extended to obtain RNA expression profiles, epigenetic profiles, and even protein profiles. This dissertation contributes to the field by presenting novel techniques aimed at improving the efficiency of RNA sequencing and enabling multi-modal DNA methylation profiling. These advancements are applied to develop new insights in classically elusive biological systems, including non-model microbial samples and human germ cell development.
Highly efficient bacterial mRNA sequencing: Microbiomes are some of the most diverse and underexplored biological systems on Earth, and one of the limitations for discovery and characterization is the relatively low efficiency of bacterial RNA sequencing compared to eukaryotic workflows. RNA sequencing is a widely used technique to quantify the distribution of mRNA molecules in a population to gain a genome-wide understanding of cellular functions and phenotypes. However, in bacteria, mRNA sequencing is inefficient due to the abundance of ribosomal RNA that is challenging to deplete. Unlike eukaryotic cells, bacterial mRNA lacks the poly-A tail that typically enables efficient capture and enrichment of mRNA from the abundant rRNA molecules in a cell. To improve the efficiently of mRNA sequencing in bacterial samples, this dissertation reports EMBR-seq (Enrichment of mRNA by Blocked rRNA), and EMBR-seq+, two methods that specifically deplete 5S, 16S and 23S rRNA using blocking primers to prevent their amplification and RNase H for targeted digestion. EMBR- seq is highly sensitive and successfully quantified the transcriptome from samples containing as little as 20 pg of total RNA. Furthermore, EMBR-seq+ was applied to examine differential expression of anaerobic bacteria grown in monoculture vs co-culture with anaerobic fungi and revealed that the presence of the anaerobic fungi induces bacterial transcriptome remodeling, including the downregulation of lignocellulose-degrading machinery.
Simultaneous profiling of 5mC, 5hmC, and RNA: In mammals, the DNA modification 5- methyl cytosine (5mC) is a known regulator of gene expression and thus, 5mC profiles are closely tied to cell identity. During early embryogenesis, as distinct cell lineages are established from pluripotent stem cells, the cells that are specified as primordial germ cells (PGCs) are rapidly demethylated before developing into mature gametes. The mechanisms governing DNA demethylation in human PGCs remain unclear, and this currently limits our understanding of gametogenesis during healthy and disease trajectories. Both active and passive demethylation are known to contribute to methylation erasure, although new techniques are required to assess the timing and heterogeneity of each mechanism across the PGC population. Therefore, to probe both modes of demethylation and profile cellular identity, this dissertation presents a strand-specific sequencing method, “single-cell methylation, transcriptome, and hydroxymethylation-seq” (scMTH-seq), which captures 5mC, 5hmC, and RNA simultaneously in single cells. Strand specific 5mC counts can detect passive demethylation through wide distributions of strand asymmetry, while increased global 5hmC levels serve as an indicator for active demethylation. When applied to stem cell-derived human PGC-like cells, scMTH-seq identified two distinct transcriptional groups, one of which was passively demethylating. Based on differential expression between the two groups, it is likely that the genes DND1 and SOX15 are key drivers that initiate PGC demethylation and maturation.