Achieving Multi-scale Cell Morphology Clustering using Machine Learning
- An, Je
- Advisor(s): Yeo, Eugene M.;
- Pasquinelli, Amy E.
Abstract
Understanding the heterogeneity of cell types and gene expression in complex tissues is crucial for advancing single cell genomics. Spatial transcriptomics enhances this understanding by adding spatial context, enabling a more comprehensive view of cellular function and organization. Additionally, the morphology of a cell is known to influence gene expression, and gene expression, in turn, affects cell morphology, highlighting the intricate relationship between a cell’s physical structure and its molecular activity.
Despite efforts to apply morphogenomics to single cell spatial data, existing methods face significant challenges in efficiently scaling to entire datasets. To address this limitation, I developed an autoencoder using PyTorch, a Python machine learning package, capable of replicating a 64x64 image of a cell mask. By utilizing the encoded latent layer, which is significantly smaller in dimension, this approach allows for multi-scale clustering of cell morphologies. I demonstrate Xenium datasets, Tissuenet datasets, and a pool of other datasets that were made publicly accessible.
To promote recreation and usability, the autoencoder is designed to integrate seamlessly with Bento, a computational toolkit developed in our lab. By being part of a diverse portfolio of software analyses tools, it maximizes the functionality and accessibility of the tool for further research and analysis.