Proteins function in many biochemical processes essential to life. Our ability to understand how these functions relate to both the sequence and structure of proteins are paramount to understanding the underpinnings of myriad diseases and indeed life itself. Unfortunately, our understanding of this relationship can, at times, be quite limited. Subtle variations in sequence and structure can lead to quite different function. However, computational methods, paired with biochemical characterization, can help better define these complex relationships.
One approach to achieve this goal is through an analysis of a protein's superfamily, defined as a group of evolutionarily related proteins which conserve some aspect of function, even though the overall functions of individual proteins may be quite distinct. By comparing evolutionarily related proteins, new insights into the evolution of function, and by proxy, the relationship between a proteins' sequence, structure, and function can be made.
Using a variety of computational tools, large-scale analyses of protein sequences and structures of the six-bladed β-propeller fold class were conducted. These studies identified more than 2500 sequences belonging to a superfamily of enzymes termed the Nucleophilic Attack 6-bladed β-Propeller (N6P) superfamily, in which an apparent mechanistic similarity involving nucleophilic attack appeared to be conserved. This superfamily was further classified into three subgroups: the arylesterase-like, the senescence marker protein-30/gluconolactonase/luciferin-regenerating enzyme-like (SGL), and the strictosidine synthase-like (SSL) subgroups. The first two subgroups had previously been shown to catalyze hydrolytic reactions whereas the SSL subgroup had only been shown to catalyze a condensation reaction.
Interestingly, most SSL proteins differed from known strictosidine synthase (SS) enzymes in that they contained metal-coordinating active site ligands common to the rest of the superfamily but which were missing in SS. This insight allowed the identification of nearly 500 sequences annotated as "strictosidine synthase" as functionally misannotated proteins. Furthermore, function predictions based on the superfamily context were made for several hundred of these SSL proteins, and in one instance was biochemically confirmed by a collaborating laboratory. Unfortunately, many SSL proteins proved difficult to work with in the wet lab and as such, many more questions remain regarding the reaction specificity of these proteins.