Search

Thesis
Peer Reviewed

What can be learned from Repository-Scale Public Mass Spectrometry Data?

Pullman, Benjamin
Advisor(s): Bandeira, Nuno

UC San Diego Electronic Theses and Dissertations (2022)

High-throughput tandem mass spectrometry has enabled the detection and identification of over 75\% of all human proteins predicted to result in translated gene products from an available tens of terabytes of public data in thousands of datasets. This thesis explores what we can learn from this, as well as the challenges that arise when considering proteomics data at a repository scale. First, we will consider validating what is known, through resources to build, curate, and explore both FDR-controlled and user submitted libraries. Second, we present a tool that allows for an automation of application of strict community guidelines criteria to any set of search results, including peak quality and novel FDR controls. Third, we introduce a method to illuminate the extent of what is not yet known using a new clustering approach designed to explicitly model peptide diversity by explicitly modeling spectrum coelutions. Finally, fourth, we developed a method for extremely fast single spectrum searches against spectrum repositories consisting of billions of spectra to both confirm or refute knowledge base IDs as well as discover similar spectra to those consistently unidentified.

Cover page: What can be learned from Repository-Scale Public Mass Spectrometry Data?

Article
Peer Reviewed

GNPS Dashboard: Collaborative Analysis of Mass Spectrometry Data in the Web Browser

UC Riverside Previously Published Works (2021)

Access to web-based platforms has enabled scientists to perform research remotely. A critical aspect of mass spectrometry data analysis is the inspection, analysis, and visualization of the raw data to validate data quality and confirm statistical observations. We developed the GNPS Dashboard, a web-based data visualization tool, to facilitate synchronous collaborative inspection, visualization, and analysis of private and public mass spectrometry data remotely.

Cover page: GNPS Dashboard: Collaborative Analysis of Mass Spectrometry Data in the Web Browser

Article
Peer Reviewed

microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data

UC San Diego Previously Published Works (2024)

microbeMASST, a taxonomically informed mass spectrometry (MS) search tool, tackles limited microbial metabolite annotation in untargeted metabolomics experiments. Leveraging a curated database of >60,000 microbial monocultures, users can search known and unknown MS/MS spectra and link them to their respective microbial producers via MS/MS fragmentation patterns. Identification of microbe-derived metabolites and relative producers without a priori knowledge will vastly enhance the understanding of microorganisms' role in ecology and human health.

Cover page: microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data

Creative Commons 'BY' version 4.0 license