Spectral Library Construction and Matching of MS/MS Spectra at Repository Scales
The characterization of proteins, peptides, metabolites, and natural products are crucial to the understanding biological processes, discovering biomarkers, and uncovering new therapeutic molecules. Tandem mass spectrometry (MS/MS) has proven to be a high throughput and sensitive tool to assay these molecules, whereby the fragmentation observed in the MS/MS spectra functions as a reproducible signature for each molecule. Thus, any acquisition of a molecule's MS/MS fragmentation can be aggregated into a reusable collection of observed and annotated MS/MS spectra known as a spectral library.
Due to the reproducibility of a molecule's MS/MS spectrum, spectral libraries have gained traction as a resource for the sensitive identification of newly acquired MS/MS spectra. Thus, the utility of spectral libraries rests on the reliability of MS/MS similarity metrics as well as the quality and size of the libraries themselves.
In this dissertation we highlight the computational methods that were developed to enable the creation of spectral libraries for proteomics, metabolomics, and natural products discovery. These methods include the aggregation and analysis of the entire community's mass spectrometry data along with online computational resources that crowd-source the annotation and curation of specialized spectral libraries. Further, by leveraging repository scale mass spectrometry data, we have developed methods to assign statistical significance to spectral similarity metrics in order to enable the automated identification of MS/MS data by matching to spectral libraries.