Data Biology: A quantitative exploration of gene regulation and underlying mechanisms
- Author(s): Schiller, Benjamin Joseph
- Advisor(s): Yamamoto, Keith R
- et al.
Regulation of gene expression is a fundamental biological process required to adapt the full set of hereditary information (i.e., the genome) to the varied environments that any organism encounters. Here, we elucidate two distinct forms of gene regulation – of endogenous genes by binding of transcription factors to information-containing genomic sequences and of selfish genes (“transposons”) by targeting of small RNAs to repetitive genomic sequences – using a wide array of approaches.
To study regulation by transcription factors, we used glucocorticoid receptor (GR), a hormone-activated, DNA-binding protein that controls inflammation, metabolism, stress responses and other physiological processes. In vitro, GR binds as an inverted dimer to two imperfectly palindromic “half sites” separated by a “spacer”. Moreover, GR binds different sequences with distinct conformations, as demonstrated by nuclear magnetic resonance spectroscopy (NMR) and other biophysical methods.
In vivo, GR employs different functional surfaces when regulating different genes. We investigated whether sequences bound by GR in vivo might be a composite of several motifs, each biased toward utilization of a particular pattern of functional surfaces of GR. Using microarrays and deep sequencing, we characterized gene expression and genomic occupancy by GR, with and without glucocorticoid treatment, of cells expressing GR alleles bearing differences in three known functional surfaces. We found a “sub-motif”, the GR “half site”, that relates to utilization of the dimerization interface and directs genomic binding by GR in a distinct conformation.
To study repression of tranposons, we characterized the production and function of small RNAs in the yeast Cryptococcus neoformans. We found that target transcripts are distinguished by suboptimal introns and inefficient splicing. We identified a complex, SCANR, required for synthesis of small RNAs and demonstrate that it physically associates with the spliceosome. We propose that recognition of gene products by SCANR is in kinetic competition with splicing, thereby further promoting small RNA production from target transcripts.
To achieve these results, we developed new bioinformatics tools: twobitreader, a small Python package for efficient extraction of genomic sequences; scripter, a flexible back-end for easily creating scripts and pipeline; and seriesoftubes, a pipeline built upon scripter for the analysis of deep sequencing data.