Skip to main content
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Structure-Preserving Rearrangements: Algorithms for Structural Comparison and Protein Analysis


Protein structure is fundamental to a deep understanding of how proteins function. Since structure is highly conserved, structural comparison can provide deep information about the evolution and function of protein families. The Protein Data Bank (PDB) continues to grow rapidly, providing copious opportunities for advancing our understanding of proteins through large-scale searches and structural comparisons. In this work I present several novel structural comparison methods for specific applications, as well as apply structure comparison tools systematically to better understand global properties of protein fold space.

Circular permutation describes a relationship between two proteins where the N-terminal portion of one protein is related to the C-terminal portion of the other. Proteins that are related by a circular permutation generally share the same structure despite the rearrangement of their primary sequence. This non-sequential relationship makes them difficult for many structure alignment tools to detect. Combinatorial Extension for Circular Permutations (CE-CP) was developed to align proteins that may be related by a circular permutation. It is widely available due to its incorporation into the RCSB PDB website.

Symmetry and structural repeats are common in protein structures at many levels. The CE-Symm tool was developed in order to detect internal pseudosymmetry within individual polypeptide chains. Such internal symmetry can arise from duplication events, so aligning the individual symmetry units provides insights about conservation and evolution. In many cases, internal symmetry can be shown to be important for a number of functions, including ligand binding, allostery, folding, stability, and evolution.

Structural comparison tools were applied comprehensively across all PDB structures for systematic analysis. Pairwise structural comparisons of all proteins in the PDB have been computed using the Open Science Grid computing infrastructure, and are kept continually up-to-date with the release of new structures. These provide a network-based view of protein fold space. CE-Symm was also applied to systematically survey the PDB for internally symmetric proteins. It is able to detect symmetry in ~20% of all protein families. Such PDB-wide analyses give insights into the complex evolution of protein folds.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View