- Main
Generating functions of tandem mass spectra and their applications for peptide identifications
Abstract
Mass spectrometry (MS) has become the leading high- throughput technology for proteomics, a large-scale study of proteins. MS experiments generate tandem mass (MS/MS) spectra, each representing a peptide. Identifying peptides from MS/MS spectra is a basic and essential task in proteomics studies. At present, MS instruments and experimental protocols are rapidly advancing, however, the software tools to interpret MS/MS spectra are lagging behind with many computational problems remaining unsolved. In this dissertation, we present a novel approach to interpreting MS/MS spectra, called the generating function approach, and show how this approach enables us to solve key computational problems in MS. First, we address the problem of estimating statistical significance of Peptide-Spectrum Matches (PSMs). Since typically less than 30% of the generated spectra can be correctly interpreted, this problem is important in distinguishing between correct and incorrect PSMs. Using the generating function approach, we present the first analytical (rather than empirical) solution to this problem. Our MS-GF tool not only improves the accuracy of statistical significance estimates, but also in- creases the number of peptide identifications at a fixed error rate. Next, we present an alternative approach to peptide identifications based on generating all plausible de novo interpretations of a spectrum (spectral dictionary) and then quickly matching them against the protein database. Our MS-Dictionary tool enables proteogenomic searches in six-frame translation of genomic sequences that may be prohibitively time-consuming with traditional methods. We also present spectral profiles, a new representation of tandem mass spectra that compactly represent spectral dictionaries. Spectral profiles can be used to generate gapped peptides that are as useful as full-length peptides and as accurate as peptide sequence tags of length 3 traditionally used to speed up database searches. Lastly, we present a new database search tool MS-GF+ based on MS- GF. MS-GF+ is sensitive (it identifies more peptides than other database tools) and universal (works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse types of spectral datasets, and show that for all these datasets, MS-GF+ significantly increased the number of identified peptides compared to state-of-the-art methods for peptide identifications
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-