Multiplexed Approaches to Characterize Sequence-Function Relationships of Escherichia coli Promoters
- Author(s): Urtecho, Guillaume
- Advisor(s): Kosuri, Sriram
- et al.
Despite decades of intense genetic, biochemical, and evolutionary characterizations of promoters in Escherichia coli, we still lack the basic ability to identify which genomic sequences represent promoters as well as the transcriptional activity of these sequences. Furthermore, roughly two-thirds of the 2,565 reported E. coli operons do not contain any transcription factor binding site annotations, highlighting our lack of understanding the regulation of these essential genetic components. In my thesis work, I sought to fill this lack of understanding by 1) Identifying promoters in the E. coli genome, 2) Discovering the regulatory elements within these promoters that encode their activity, and 3) Characterizing the combinatorial interactions between promoter regulatory elements to learn how they cooperatively determine expression.
During the early stages of my graduate work I developed a genomically-encoded massively parallel reporter assay to measure promoter activity of hundreds of thousands of DNA sequences simultaneously in E. coli. In Chapter 3 we used this technology to perform the first full characterization of autonomous promoter activity in E. coli, measuring promoter activity of >300,000 sequences spanning the entire genome and precisely mapping 2,859 promoters active in rich media. After identifying promoters in E. coli, we sought to deconstruct these sequences to learn how they encoded promoter regulation. We performed a scanning mutagenesis of these discovered promoters and identified the sequence motifs and elements within these regions, providing insights into the regulation of 1,158 E. coli operons. Overall, we generated a genome-wide atlas of promoters in E. coli as well as a rich dataset for future computational modeling projects to dissect the relationship between DNA sequence and promoter function.
Concurrent with our work to identify promoters and their regulatory sequence elements, we also sought to learn how combinations of these regulatory sequences cooperatively determine expression. Basic promoters are composites of multiple discrete sequence elements recognized by RNA polymerase (RNAP). While it is known that the overall affinity of RNAP to these elements determines the strength of the promoter it has been unclear how combinations of these motifs collectively determine promoter activity. To explore this, in Chapter 4 we measured the activity of over 10,000 synthetic promoters composed of different combinations of RNAP binding sites that spanned a range of affinities. We learned that synergistic and other non-linear interactions between RNAP binding sites are responsible for a significant proportion of variance in promoter activity and by capturing these interactions in a statistical model, we can predict the activity of promoters with over 95% accuracy. Furthermore, we discovered the novel phenomenon that promoters composed of the strongest sequence elements function poorer than expected, as their overpowering binding affinities prevent RNAP from escaping to perform transcription. In Chapter 5, we expand on this analysis to study combinatorial interactions in the context of repression by LacI. By studying 8,269 lacUV5 promoter variants composed of different combinations of RNAP and LacI repressor sites, we are able to study the interactions between these sites in a variety of binding site arrangements. This work revealed the principle relationships between RNAP and repressors as well as provided insight in how to tune repressor binding site affinities in order to maximize inducibility of promoters for synthetic biology applications.
These projects have greatly expanded what is known about the native E. coli promoter landscape, revealed insights on how promoter organization influences their roles, and deconstructed the interactions between sequence elements that compose promoters. In addition to the knowledge of E. coli promoter regulation, the technologies and methodologies we developed in this work can be used to characterize virtually any genetically tractable bacteria.