Protein similarity has been used for the annotation and classification of proteins when the structure of the protein is available. Protein similarity comparisons may be made on a local or global basis and may consider sequence information and differing levels of structural information. This dissertation details the method Surflex PSIM, a local 3D method that compares the surfaces of protein binding sites.
PSIM is a local 3D method that compares protein binding site surfaces in full atomic detail. The approach is based on the morphological similarity method (Surflex-Sim) which has been widely applied for global comparison of small molecules. This methodology has the ability to determine the differences between very similar proteins with different ligand binding specificity and the ability to correctly align extremely divergent proteins with only a small region of similarity. PSIM performed well on known standards for binding site comparisons.
In a docking benchmark study, PSIM was used to assist in multi-structure docking protocols. In these protocols, proper selection of target structures can reduce time required for screening and increase accuracy. Selection of a minimal representative set of docking target conformations was performed automatically using PSIM. Several docking targets, for which unsatisfactory results had been obtained used a single-structure protocol, yielded substantial improvements using the PSIM-aided multi-structure docking protocol.
Further development of an automated binding-site detection algorithm allowed for PSIM to be used as screening tool for annotating proteins with unknown function. A dataset was created of proteins whose function was determined after their crystallization. PSIM was able to automatically detect binding sites on a majority of these proteins and successfully match them to proteins that were present in the PDB at the time of crystallization that have the same function. PSIM was further used to explore possible functions for several proteins whose function is still unknown.
The main contribution of this dissertation is a fast and accurate method for the comparison of protein binding sites agnostic of sequence information. This methodology has applications in the analysis of ligand specificity analysis and the annotation of proteins with unknown function.