Correcting for sparsity and interdependence in glycomics by accounting for glycan biosynthesis.
Published Web Locationhttps://doi.org/10.1038/s41467-021-25183-5
Glycans are fundamental cellular building blocks, involved in many organismal functions. Advances in glycomics are elucidating the essential roles of glycans. Still, it remains challenging to properly analyze large glycomics datasets, since the abundance of each glycan is dependent on many other glycans that share many intermediate biosynthetic steps. Furthermore, the overlap of measured glycans can be low across samples. We address these challenges with GlyCompare, a glycomic data analysis approach that accounts for shared biosynthetic steps for all measured glycans to correct for sparsity and non-independence in glycomics, which enables direct comparison of different glycoprofiles and increases statistical power. Using GlyCompare, we study diverse N-glycan profiles from glycoengineered erythropoietin. We obtain biologically meaningful clustering of mutant cell glycoprofiles and identify knockout-specific effects of fucosyltransferase mutants on tetra-antennary structures. We further analyze human milk oligosaccharide profiles and find mother's fucosyltransferase-dependent secretor-status indirectly impact the sialylation. Finally, we apply our method on mucin-type O-glycans, gangliosides, and site-specific compositional glycosylation data to reveal tissues and disease-specific glycan presentations. Our substructure-oriented approach will enable researchers to take full advantage of the growing power and size of glycomics data.