Computational inference of transcriptional regulation in eukaryotes
- Author(s): Liu, Jie;
- et al.
Inference of transcriptional regulation, which includes discovering binding sites of a transcription factor (TF), identifying its direct target genes and detecting its dynamical activity, is an important step towards reconstructing transcription network. In this dissertation, I have developed three novel computational methods to tackle this task by integrating large-scale genomic data. All three methods train a probabilistic model using sequence motif, gene expression, TF binding and conservation data. This probabilistic model provides an elegant way to reduce noise in individual data by integrating multiple sources of data. Mathematically, they maximize the joint likelihood of the observable data using expectation-maximization method. The hidden variables in the models represent the identity of a gene (target or not in TRANSMODIS and CompMODEM) and the activity of a TF (in ActivMiner). The EM algorithm iteratively determines these hidden variables and the parameters in the models. The three methods have different purposes. The first two methods called TRANSMODIS and CompMODEM aim to identify binding sites and direct target genes of TFs. TRANMODIS takes into account that target genes of a TF normally share similar sequence motifs in the TF binding regions and gene expression patterns under different conditions. If only a single gene expression or TF binding experiment is available, in addition to the sequence and expression information, CompMODEM considers conservation of TF binding sites in the model because functional regulatory sites tend to be evolutionarily constrained. Both TRANSMODIS and CompMODEM assume the TF of interest is active. When such information is not available, ActivMiner aims to simultaneously infer the dynamic activity of TFs and their regulatory targets. These methods have been successfully applied to multiple species including human, worm and yeast. The studies presented in this dissertation lay the foundation of inferring gene regulatory network, which is a great challenge in the post-genome era. With the fast accumulation of genomic data, these methods will provide a set of useful tools to understand transcriptional mechanisms