A Global View of Structure-function Relationships in the Tautomerase Superfamily
- Author(s): Davidson, Rebecca
- Advisor(s): Babbitt, Patricia
- et al.
In the genomic age, there is so much data that experimental validation on all of it is impractical. Therefore, computational methods need to be employed to probe this influx of sequence data. Even though we can only experimentally characterize a small amount of these proteins, the sequence data carries a tremendous amount of information through the genetic diversity introduced. This work is concerned with strategies that work towards harnessing sequence similarity networks to their fullest potential within the context of a particular enzyme superfamily, the Tautomerase Superfamily (TSF).
Sequence similarity networks (SSNs), are a type of similarity network in which the nodes represent proteins (or groups of similar proteins as in the representative networks) and the edges represent the similarity between two nodes. The edges are well defined and calculated as the pairwise BLAST E-value. These networks provide a graphical view of the similarity relationships within a set of proteins and provide a means to facilitate large-scale analyses. They can also be studied analytically using a variety of algorithms and allow the all-by-all comparisons of tens of thousands of proteins in an intuitively accessible manner.
In this work, a new method of probing SSNs in presented in application to the TSF. This method involves identifying a similarity path through a network with the sequences on the path exhibiting transitional functional features. This path was then used to guide target selection for crystallization of transition linker proteins. Those targets are presented in the context of larger bioinformatics analysis, including a phylogenetic reconstruction, and an experimental kinetic analysis.
Additionally, the classification of the TSF into subgroups, and in some cases a finer level of clustering, is provided. This curated work complete with sequence alignments and HMMs is hosted on the Structure Function Linkage Database for open access to the scientific community, with the alignments additionally available on GitHub.