In recent years, mass spectrometry has become a staple technique in biochemistry and molecular biology, with mass spectrometry based proteomics being one of its greatest successes. The standard method for determining protein identifications though the use of mass spectrometry involves a number of steps. First, a solution of whole proteins is digested with a protease, typically trypsin. The resulting peptides are then separated by liquid chromatography, and a full scan mass spectrum is obtained for each eluting fraction. As ions themselves produce little information that can be used to determine protein identifications, peptide ions are then selected for secondary fragmentation and a tandem mass spectrum is obtained. From this secondary spectrum, peptide sequence information can be obtained after comparison to proteome databases.
However, despite being a powerful tool for peptide identification, the traditional shotgun proteomics approach often suffers from limited sensitivity and a lack of reproducibility between replicate analyses. A major source of these limitations is due to the way in which ions are chosen for fragmentation. As unmodified peptide ions are virtually indistinguishable in a full scan mass spectrum, the vast majority of experiments select ions for fragmentation based solely on the signal intensity of each ion. In complex samples, this has the often undesired consequence of biasing the search towards the most abundant, though often uninteresting peptides. Furthermore, due to the stochastic nature of ion selection, it is often difficult to reproduce a list of protein identifications even if the same biological sample is used for multiple experiments. This dissertation focuses on the idea of using chemical tagging strategies to introduce information into a complex sample that can then be used to direct MS analysis away from the most abundant species and towards those most likely to be interesting in a given biological context. The technology developed is then applied to the study of protein glycosylation, a type of protein post-translational modification ubiquitous in eukaryotic organisms.
In Chapter 1, current technologies for studying glycoproteins using mass spectrometry are surveyed. The emphasis in this chapter is on the use of unnatural sugar substrates for the metabolic engineering of glycan structures, and applications of metabolic engineering to glycoproteomics. This chapter also reviews the use of bioorthogonal reactions in the context of glycoproteomics. Finally, the standard workflow for proteomics experiments is examined and the concept of directed mass spectrometry is introduced. Chapter 2 proposes a method for using chemistry to add information to a biological system which can then be used to direct the MS analysis of biomolecules to bias analysis towards a subset of so-called ``information-rich'' ions. This system uses the distinctive isotopic distribution of a chemical label to perturb the isotopic envelope of a biomolecule in a way that is detectable in a full-scan mass spectrum. Coupled with a computational algorithm described in Chapter 3, we term this methodology the IsoStamp system.
The isotopic pattern searching algorithm introduced relies on the ability to accurately predict the isotopic envelope of a peptide solely from the molecular weight of the ion. Such a system is analyzed in Chapter 4, and the scope is extended to applications in glycobiology including the prediction of isotopic envelopes of biomolecules such as mucins, where a large percentage of the molecular weight is attributed to carbohydrate content. Potential weaknesses of current metabolic oligosaccharide engineering techniques as they are employed in mass spectrometry is that they typically require a secondary labeling step, and that unnatural sugars may not be incorporated into glycan structures at stoichiometric levels. Chapter 5 introduces an alternative approach whereby an isotopically labeled mixture of a natural substrate, GlcNAc, is fed to cells and is subsequently incorporated into N-glycan structures at stoichiometric levels. N-glycosylated peptides are then targeted for MS/MS analysis based on their isotopic distribution, and sites of modification are determined by comparison to a proteome database. Finally, Chapter 6 examines the future of isotopic labeling in biological mass spectrometry, suggesting a number of applications of the IsoStamp technology.