Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Methods for Measurement and Interpretation of Gene Expression

No data is associated with this publication.
Abstract

For the last 15 years, the world's sequencing capacity has increased at a staggering rate. While the focus of much of this has been on the sequencing of genomic DNA, the sequencing of transcribed RNAs as a proxy for the expression of genes has also taken a starring role. In this thesis, we consider three themes related to these RNA-seq experiments, operating at three different levels.

We begin by considering the quantification of a single RNA-seq experiment. Here the blessing of cheap sequencing has become something of a curse as data is being generated at such a rate that analyzing that it can end up being an onerous task. To address this, we attempt to do as much as we can with very little: without aligning our sequencing data to a genome and foregoing much of the information usually utilized in RNA-seq analyses, we manage to produce quantifications of expression that are close in accuracy to those produced by state of the art, sophisticated models while using a very small fraction of their computational resources.

Next, we consider the quantification of RNA-seq datasets with many samples. While generating such datasets was previously a difficult and expensive undertaking, it is becoming increasingly routine and so the question of whether they should have their own specialized form of analysis should be addressed. In such datasets, the different samples are almost always related in some sense which creates the possibility that, rather than analyzing each sample independently, information could be shared between the analyses thereby improving the accuracy of all. To that end, we propose a method of regularizing the analysis of many sample RNA-seq inspired by techniques in machine learning and provide evidence that it may improve the accuracy many sample RNA-seq experiments which are shallowly sequenced and therefore would most likely benefit from such information sharing.

Finally, we consider some aspects of the analyzing the results of many sample RNA-seq experiments. Specifically, we consider experiments in which expression has been measured in many individuals and ask: do we see the signature of population structure in expression data, as we do in genotype data? While the question has been answered in the negative in the past, we show that more sensitive analysis reveals that population signals do indeed leave their mark on expression.

Main Content

This item is under embargo until November 30, 2025.