Penalty-Based Dynamic Programming for the Identification of Post-Translational Modifications in Peptide Mass Spectra
- Author(s): Bernstein, Laurence Elliot
- Advisor(s): Bandeira, Nuno
- Briggs, Steven
- et al.
Tandem mass spectrometry (MS/MS) has long been the leading method of identifying peptides and proteins in complex biological samples and many algorithms have been created for this purpose. Many of the methods for searching MS/MS spectra against a database of known proteins must restrict the number of post-translational modifications (PTMs) that they can identify because the larger the number of PTMs being considered, the larger the search space, which in turn increases both computational complexity and the potential for false matches. In addition these algorithms cannot discover new peptides or homologues or be used with species for which a protein database does not exist. Newer algorithms have been developed that perform “open” or “blind” searches capable of finding any possible modifications, however these methods increase the search space even further, often resulting in lower performance and the generation of many putative modification masses that must be sifted through manually to determine which are real.
To address the shortcomings of the existing methods, we created a new blind database search algorithm based on spectral networks. Our method uses a modification of the standard spectral tagging filtration techniques tailored for contig-consensus spectra generated from spectral networks, along with, the first of its kind, penalty-based, dynamic programming spectrum-database alignment algorithm that is able to accurately to identify both a priori specified modifications as well as novel PTMs. We then developed a workflow based on these new techniques that combines previous work in clustering, spectral alignment, spectral networks, and multi-spectral assembly. Because our new algorithm only identifies spectra that lie within the spectral networks, we created a workflow, called RaVen, that merged our method with MS-GF+ and combines the results from both methods resulting in a method with massive improvement in overall identification rates above existing methods while at the same time identifying many more rare modifications in samples. We also propose an improved way of measuring the accuracy of blind search algorithms: “peptide variants” which better meet captures the goals of blind search methods and does not rely on precise localization of modifications (which is very difficult to achieve for most algorithms).