RNA transcription is the primary route through which genomes determine the phenotype of organisms. However, proper execution of this process on a genome-wide scale requires that (1.) the DNA of transcribed genes lies within portions of the genome that are accessible to RNA polymerase proteins, and (2.) that the mRNAs produced from these genes are stable enough in the cytoplasm to be translated. To complicate matters, many genomes are saturated with transposable elements (TEs) that must remain silent and not be transcribed. My work revolves around understanding the molecular mechanisms and evolutionary processes that govern DNA accessibility and RNA structure/stability. My first chapter focused on methylated CHH (mCHH) islands, short regions of high methylation near genes that are linked to TE silencing and partitioning the genome between actively transcribed and non-transcribed components. We analyzed the evolutionary conservation of mCHH islands among grass (family Poaceae) genomes, as well as their relationships with gene expression, genic methylation, and proximity to TEs. We found that they were seldom conserved in orthologous genes between species, but they often corresponded to insertions of certain DNA transposon families. They were also significantly negatively associated with methylated but positively associated with gene expression. Based on these findings, we propose a model wherein mCHH islands are a consequence of aberrant transcription leading to RNA-directed DNA methylation.
An unsolved mystery in genome partitioning is how TEs are initially identified and targeted for silencing. One way that this process has been observed is through hairpin secondary structures that form whenever TEs escape silencing and are transcribed. These hairpins act as a signal that allows structured transcripts to be broken down into small (21–24-nt) RNAs, which then methylate complementary parts of the genome. My second chapter focused on analyzing the genome-wide prevalence of this phenomenon in maize (Zea mays), where we found that it is widespread across many types of TEs. We also found that, where they exist in genes, these hairpin-like structures have the same effect. The prevalence of these structures despite their epigenetic effects suggests a conflict between RNA function and stability.
Finally, I studied the evolutionary dynamics of secondary structure in Arabidopsis thaliana using a novel method to identify derived mutations that interrupt ancestral mRNA secondary structures. I found that these mutations, even those at synonymous sites, exist at reduced frequencies relative to putatively neutral mutations in the global Arabidopsis population. Based on population genetic data, I estimated the selective effects of these mutations; they are more deleterious than neutral mutations but not as deleterious as most missense mutations. The population frequencies also varied between Arabidopsis subpopulations on a geospatial scale and were correlated with temperature. I hypothesize that the correlates with temperature reflect the fact that secondary structures vary in part as a function of heat.