Genome-wide mapping and analysis of mammalian promoters
- Author(s): Barrera, Leah Ortiz-Luis
- et al.
Mammalian organisms such as mouse and human are characterized by large genomes of 2-3 billion base pairs. Sequencing of these genomes has revealed that only a small fraction, ̃1.5%, encodes protein-coding genes. The diversity of more than 200 cell types which make up mammals, from the zygote to the differentiated cell types which perform the functions of organs, is brought about by the coordinated expression of specific subsets of these genes. Control of gene expression is, in turn, mediated by the binding of transcription factors at non-coding genomic regulatory sequences such as promoters, enhancers, and insulators. Unraveling the control of gene expression, resulting in mammalian cell type diversity, thus entails the accurate and systematic characterization of these sequences. In this work, we describe a pilot application of chromatin immunoprecipitation with microarrays (ChIP- chip) to define active promoters in human fibroblast cells. To do this, we mapped the genomic location of components of the transcription pre-initiation complex (PIC) using microarrays tiling the entire non-repetitive human genome sequence at 100 bp resolution. The scale and novelty of this high-throughput strategy entailed significant bioinformatics challenges. In particular, we highlight our model-based approach for the accurate identification of binding sites from the data. Interestingly, this pilot identification of 10,567 active promoters revealed the extent of alternative promoter usage within a single cell type, clustering of active promoters, and classes of genes based on PIC binding and transcript expression level. We then extended our genome- wide promoter mapping strategy to characterize active promoters in mouse embryonic stem cells (mES) and adult organs. We mapped ̃24,000 promoters across these samples, including 5,153 sites validating cap-analysis of gene expression (CAGE) 5' end data in addition to 16,976 annotated mRNA 5' ends. To profile promoter usage across tissues by relative occupancy of RNA polymerase II (Pol II), we adapted a quantitative index of tissue-specificity and thus overcome limitations of "bound" or "unbound" classification. We examined the sequence and epigenetic features of tissue-specific promoters defined by this measure and discovered a subset of promoters with enriched Pol II binding in mES persistently marked by H3K4me3 in adult tissues