An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis σ66 promoters
- Author(s): Mallios, Ronna Reuben
- Advisor(s): Ojcius, David
- et al.
Promoter identification is crucial for understanding gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis (C. trachomatis) are large enough to recommend an organismspecific modeling effort for C. trachomatis. The intracellular life-cycle of Chlamydiae impedes the study of gene regulation. The bacteria are difficult to purify in large quantities and are resistant to standard genetic manipulation techniques. Consequently, less than 40 C. trachomatis σ66 promoters had been mapped at the inception of this study. Utilizing 29 of these experimentally identified promoters as a training set, this research develops an iterative model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model C. trachomatis σ66 promoters. The resulting model, MMCTPP1 (Multiple Metric Chlamydia Trachomatis Promoter Prediction), predicts the training set with a high degree of accuracy and provides insights into the structure of the promoter region. MMCTPP1 C. trachomatis genome-wide predictions are provided, as well as co-predictions with three other algorithms. The substantial overlap between MMCTPP1 predictions and others bolsters the credibility of all four algorithms. To validate the genome-wide predictions, 317 recently mapped transcription start sites of annotated C. trachomatis genes were combined with predictions from MMCTPP1 and TSS-PREDICT. The result maps 169 C. trachomatis σ66 promoters, yielding a four-fold increase in established promoters. These will assist researchers in studying gene regulation in C. trachomatis and enhance the training set for the development of MMCTPP2. This second generation multiple metric model will predict C. trachomatis σ66 promoters with increased accuracy and reveal a more refined characterization of structural features.