Functional Genomics for Improving Gene Function Assessment in Bacteria
- Author(s): Shao, Wenjun
- Advisor(s): Arkin, Adam P
- Savage, David F
- et al.
Functional genomics uses system-wide approaches to generate genome-scale data and to describe gene functions. Recent genome-wide transcriptome studies have observed a great number of unexpected transcripts internal or antisense to known genes in bacteria and archaea, but the function of these unexpected transcripts is unclear. Here, we use the metal-reducing bacterium Shewanella oneidensis MR-1 and its relatives to study the evolutionary conservation of unexpected transcriptional start sites (TSSs).
In the first part of this thesis, we present the methodology to generate a set of high-confidence TSSs. Using high-resolution tiling microarrays and 5’-end RNA sequencing, combined with a semi-supervised machine learning approach, we identified 2,531 TSSs in S. oneidensis MR-1. We then classified them based on their relative positive compared with the current gene model. 18% of the identified TSSs were located inside coding sequences (CDSs).
In the second part of this thesis, we present the conservation study of the high-confidence TSSs identified in MR-1. Comparative transcriptome analysis with seven additional Shewanella species revealed that the majority (76%) of the TSSs within the upstream regions of annotated genes (gTSSs) were conserved. 30% of the TSSs that were inside genes and on the sense strand (iTSSs) were also conserved. Sequence analysis around these iTSSs showed conserved promoter motifs, suggesting that many iTSS are under purifying selection. Furthermore, conserved iTSSs are enriched for regulatory motifs, suggesting that they are regulated. Combining with the genome-wide mutagenesis data, we show that having internal promoters significantly eliminate polar effects which are expected if the internal promoters are not functional.
In contrast, the transcription of antisense TSSs located inside CDSs (aTSSs) were significantly less likely to be conserved (22%). However, aTSSs whose transcription was conserved often have conserved promoter motifs and drive the expression of nearby genes. Overall, our findings demonstrate that some internal TSSs are conserved and drive protein expression despite their unusual locations, but the majority are not conserved and may reflect noisy initiation of transcription rather than a biological function.
In the last part of the thesis, I present the development of a high-throughput assay, bacterial two-hybrid sequencing (B2H-seq), to construct protein interactome in bacteria. This technique, if successful, will complement the existing large-scale mutant fitness profiling method in Arkin lab, and improve the gene function annotation.