Statistical Methods and Analyses in Computational Genomics: Explorations of Eukaryotic Transcription
- Author(s): Fischer, Jonathan Robert
- Advisor(s): Song, Yun S
- et al.
The introduction of next-generation, or high-throughput, sequencing techniques has fundamentally altered our perception of the genome and transcriptome by permitting the simultaneous study of tens of thousands of distinct transcripts. In recent years, the popularity of next-generation sequencing has risen due to reductions in costs and the steady accumulation of novel genetic and genomic discoveries which would have proven difficult to uncover with older approaches. The continued proliferation of these techniques both in number and frequency of use has resulted in unique data types and experimental structures which require analysis and frequently methodological development.
In this dissertation, I explore eukaryotic transcription from multiple perspectives by applying both classical and novel statistical methods to data generated by different next-generation sequencing protocols. I begin by constructing spatial nascent transcription profiles based on various RNA Polymerase II footprinting procedures to demonstrate the profound deleterious effect of RNA transcript decay factor deletion on mRNA production in yeast, bolsterering and expanding upon prior evidence of the inextricable link between RNA synthesis and decay. My focus then shifts to the development and application of a tensor-based method to RNA-seq data of both the bulk and single-cell varieties. This method is intended for use with data produced in experiments with increasingly-common specially-structured designs in which samples share tissues and/or individuals, and I show that it more robustly and powerfully characterizes the transcriptome via simulation and application to human bulk gene expression measurements. I conclude by employing this method jointly with traditional approaches to investigate the tissue-specific effects on gene expression as measured in murine single-cell RNA-seq and discuss the merits of tensor methods in such a setting.