Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

SIMS: Scalable, Interpretable Machine Learning for Single-Cell Annotation

Abstract

Experiments in molecular and cellular biology today have become increasingly large and complex, with technological advances enabling high resolution, multi-modal omics measurements at the level of individual cells. The capacity to readily collect these datasets has contributed to unprecedented biological insight – and concurrently, a data deluge. Tasks such as cell annotation and cell state characterization increasingly necessitate automation, and while data driven methods aimed at inferring cell state from omics and image data are currently in development, a focus on robustness, scalability and interpretability are paramount. We present SIMS: an end-to-end modeling pipeline for discrete morphological prediction of single-cell data with minimal boilerplate and high accuracy. We perform several studies using SIMS, and show the underlying model performs well in a variety of data-adverse conditions. Additionally, we show that SIMS performs well between tissue samples and outperforms one of the most popular cell type classification algorithms on several benchmark datasets. We also describe and implement how classification outputs can be directly characterized as a combination of sparse feature masks, allowing for interpretability at the level of individual samples. This interpretability recapitulates salient genes for classification by label, and globally across all samples. Finally, we show some use case of SIMS for inference, and argue it will become a useful tool in the field of single-cell data analysis. All code is open source and available at github.com/jlehrer1/SIMS.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View