Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Duplication and cis/trans regulatory variants: Evolutionary genomics perspectives on gene regulation

Creative Commons 'BY-NC' version 4.0 license
Abstract

Variation in gene expression contributes significantly to phenotypic variation. As a result, in addition to protein-coding loci, and genomic regions coding for gene regulatory elements are predicted to be under selection. This dissertation uses the genetic models budding yeast and fruit fly to explore the genetic basis of gene expression variation within species. The first chapter lays out the general background. It introduces the genetic architecture of gene expression, the cis/trans model, the allele-specific expression approach, and other commonly used methods in genomic studies.

The second chapter explores the relation of gene duplication and gene expression level. Gene duplication is thought to be the primary mechanism to produce new genes. However, a newly duplicated gene copy needs to exist in the population long enough to gain novel function. If the new copy affects gene expression in a deleterious direction, it would soon be eliminated by purifying selection. We would like to know how gene duplication affects expression between the duplicated genotype and the single copy genotype as well as the differences between paralogs in the duplicated genotype. We compared the genomes of strains of Drosophila melanogaster, focusing on 35 newly duplicated nuclear genes and compared the gene expression level between two duplicated paralogs and between the singletons and doublets. We found that all of the 16 analyzable genes show differential expression between paralogs under the binomial model. The other 19 genes are either 100% identical in their sequence or have more than two duplicated copies, rendering analysis of copy-specific expression patterns either impossible or ambiguous. For the total expression level, we found that most of the genes show elevated expression level, though the magnitude of change shows no clear relationship with the number of copies. The work is the first such genome-wide survey of duplicated gene expression employing comparisons of reference-grade genome assemblies. This ensures that we discover duplicates previously hidden to short-read based methods.

The third chapter discusses a novel implementation of statistical model for inference of allele-specific expression. Commonly used binomial models ignore the variance among biological replicates which leads to many false-positives. We implemented a beta-binomial model and demonstrated its advantages with both simulated and experimental data. The 20 biological replicate allele-specific expression dataset not only yields a more accurate landscape of expression variation but also provide a resource for model testing for future studies.

The fourth chapter contributes to a debate regarding the commonly reported compensatory evolution in expression regulatory control. We demonstrate with statistical principles that the observed compensatory evolution in allele-specific expression studies might merely be a measurement artifact. It then discusses an improved method and demonstrates the reduction of the negative-correlation (an indicator of compensatory evolution) mediated by shared error. Inferences are made with both simulated and published data.

The fifth and final chapter is a summary of the thesis. It also points out several unsolved problems and put forward future directions.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View