Visualization, Quantification, and Alignment of Spectral Drift in Population Scale Untargeted Metabolomics Data
- Author(s): Watrous, Jeramie D.
- Henglin, Mir
- Claggett, Brian
- Lehmann, Kim A.
- Larson, Martin G.
- Cheng, Susan
- Jain, Mohit
- et al.
Published Web Locationhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5455767/
Untargeted liquid-chromatography–mass spectrometry (LC-MS)-based metabolomics analysis of human biospecimens has become among the most promising strategies for probing the underpinnings of human health and disease. Analysis of spectral data across population scale cohorts, however, is precluded by day-to-day nonlinear signal drifts in LC retention time or batch effects that complicate comparison of thousands of untargeted peaks. To date, there exists no efficient means of visualization and quantitative assessment of signal drift, correction of drift when present, and automated filtering of unstable spectral features, particularly across thousands of data files in population scale experiments. Herein, we report the development of a set of R-based scripts that allow for pre- and postprocessing of raw LC-MS data. These methods can be integrated with existing data analysis workflows by providing initial preprocessing bulk nonlinear retention time correction at the raw data level. Further, this approach provides postprocessing visualization and quantification of peak alignment accuracy, as well as peak-reliability-based parsing of processed data through hierarchical clustering of signal profiles. In a metabolomics data set derived from ~3000 human plasma samples, we find that application of our alignment tools resulted in substantial improvement in peak alignment accuracy, automated data filtering, and ultimately statistical power for detection of metabolite correlates of clinical measures. These tools will enable metabolomics studies of population scale cohorts.Graphical Abstract