Enhancers are non-coding DNA elements found throughout the genome that, in concert with transcription factors, coactivators, and general transcriptional machinery, activate cell-type specific gene expression. Initial studies on enhancers demonstrated these regulatory elements contained clusters of transcription factor binding sites to recruit endogenous transcription factors and drive elevated expression of a target gene . These early works highlighted that theirregulatory activity was maintained despite alterations in their orientation and/or positioning relative to the targeted gene. Nearly half a century later, enhancers are center stage in efforts to characterize the regulatory components and mechanisms behind development and disease. This dissertation is a study on mammalian enhancers, the genome-wide approaches for their identification, and their contributions in early developmental processes. Chapter 1 provides an overview of the enhancer properties uncovered from various experimental systems and how these properties are harnessed to predict and further dissect enhancer activity. Chapters 2 and 3 comprise two separate projects that involve 1) an extensive in vivo assessment of active enhancers that are hidden from canonical biochemical-based methods for enhancer identification and 2) the characterization of tissue-specific enhancers across the Shox2 locus that regulate early heart, face, and limb development. Altogether these works demonstrate the critical roles of enhancers for normal organismal development and the ongoing challenge of mapping increasingly large datasets to insights on enhancer prediction and function.
Apolipoprotein A5 (APOA5) is a newly described member of the apolipoprotein gene family whose initial discovery arose from comparative sequence analysis of the mammalian APOA1/C3/A4 gene cluster. Functional studies in mice indicated that alteration in the level of APOA5 significantly impacted plasma triglyceride concentrations. Mice over-expressing human APOA5 displayed significantly reduced triglycerides, while mice lacking apoA5 had a large increase in this lipid parameter. Studies in humans have also suggested an important role for APOA5 in determining plasma triglyceride concentrations. In these experiments, polymorphisms in the human gene were found to define several common haplotypes that were associated with significant changes in triglyceride concentrations in multiple populations. Several separate clinical studies have provided consistent and strong support for the effect with 24 percent of Caucasians, 35 percent of African-Americans and 53 percent of Hispanics carrying APOA5 haplotypes associated with increased plasma triglyceride levels. In summary, APOA5 represents a newly discovered gene involved in triglyceride metabolism in both humans and mice whose mechanism of action remains to be deciphered.
With the availability of genomic sequence from numerous vertebrates, a paradigm shift has occurred in the identification of distant-acting gene regulatory elements. In contrast to traditional gene-centric studies in which investigators randomly scanned genomic fragments that flank genes of interest in functional assays, the modern approach begins electronically with publicly available comparative sequence datasets that provide investigators with prioritized lists of putative functional sequences based on their evolutionary conservation. However, although a large number of tools and resources are now available, application of comparative genomic approaches remains far from trivial. In particular, it requires users to dynamically consider the species and methods for comparison depending on the specific biological question under investigation. While there is currently no single general rule to this end, it is clear that when applied appropriately, comparative genomic approaches exponentially increase our power in generating biological hypotheses for subsequent experimental testing.
Determining how transcriptional regulatory signals are encoded in vertebrate genomes is essential for understanding the origins of multi-cellular complexity; yet the genetic code of vertebrate gene regulation remains poorly understood. In an attempt to elucidate this code, we synergistically combined genome-wide gene expression profiling, vertebrate genome comparisons, and transcription factor binding site analysis to define sequence signatures characteristic of candidate tissue-specific enhancers in the human genome. We applied this strategy to microarray-based gene expression profiles from 79 human tissues and identified 7,187 candidate enhancers that defined their flanking gene expression, the majority of which were located outside of known promoters. We cross-validated this method for its ability to de novo predict tissue-specific gene expression and confirmed its reliability in 57 of the 79 available human tissues, with an average precision in enhancer recognition ranging from 32 percent to 63 percent, and a sensitivity of 47 percent. We used the sequence signatures identified by this approach to assign tissue-specific predictions to ~;328,000 human-mouse conserved noncoding elements in the human genome. By overlapping these genome-wide predictions with a large in vivo dataset of enhancers validated in transgenic mice, we confirmed our results with a 28 percent sensitivity and 50 percent precision. These results indicate the power of combining complementary genomic datasets as an initial computational foray into the global view of tissue-specific gene regulation in vertebrates.
The accumulation of mildly deleterious missense mutations in individual human genomes has been proposed to be a genetic basis for complex diseases. The plausibility of this hypothesis depends on quantitative estimates of the prevalence of mildly deleterious de novo mutations and polymorphic variants in humans and on the intensity of selective pressure against them. We combined analysis of mutations causing human Mendelian diseases, human-chimpanzee divergence and systematic data on human SNPs and found that about 20 percent of new missense mutations in humans result in a loss of function, while about 27 percent are effectively neutral. Thus, more than half of new missense mutations have mildly deleterious effects. These mutations give rise to many low frequency deleterious allelic variants in the human population as evident from a new dataset of 37 genes sequenced in over 1,500 individual human chromosomes. Surprisingly, up to 70 percent of low frequency missense alleles are mildly deleterious and associated with a heterozygous fitness loss in the range 0.001-0.003. Thus, the low allele frequency of an amino acid variant can by itself serve as a predictor of its functional significance. Several recent studies have reported a significant excess of rare missense variants in disease populations compared to controls in candidate genes or pathways. These studies would be unlikely to work if most rare variants were neutral or if rare variants were not a significant contributor to the genetic component of phenotypic inheritance. Our results provide a justification for these types of candidate gene (pathway) association studies and imply that mutation-selection balance may be a feasible mechanism for evolution of some common diseases.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.