Accurately predicting cellular activities of proteins based on their primary amino acid sequences would greatly improve our understanding of the proteome. we present CELL-E, a text-to-image transformer model that generates 2D probability density images describing the spatial distribution of proteins within cells. Given an amino acid sequence and a reference image for cell or nucleus morphology, CELL-E predicts a more refined representation of protein localization, as opposed to previous in silico methods that rely on pre-defined, discrete class annotations of protein localization to subcellular compartments.