Motivated by (i) nature’s ability to perform reliable, efficient computation with stochastic components, (ii) the end of Moore’s Law (and other associated scaling laws) for our current computational paradigm, and (iii) the exponentially increasing amount of data (especially of the image variety) generated over the last decade, we examine herein the ability of analog valued emerging memory devices to directly store analog valued data. Specifically, we start by recasting the problem of data storage as a communication problem, and then use tools from the field of analog communications and show, using Phase Change Memory (PCM) as a prototypical multi-level storage technology, that analog-valued emerging memory devices can achieve higher capacities when paired with analog codes. Further, we show that storing analog signals directly through joint coding can achieve low distortion with reduced coding complexity. We then scale the problem up to store natural images on a simulated array of PCM devices. Here, we construct an autoencoder framework, with encoder and decoder implemented as neural networks with parameters that are trained end-to-end to minimize distortion for a fixed number of devices. We show that the autoencoder achieves a rate-distortion performance above that achieved by a separate JPEG source coding and binary channel coding scheme. Next, we demonstrate, this time by experiment, an image storage and compression task by directly storing analog image data onto an analog-valued ResistiveRAM (RRAM) array. A joint source-channel coding algorithm is developed with a neural network to encode and retrieve natural images. This adaptive joint source-channel coding method is resilient to RRAM array non-idealities such as cycle-to-cycle and device-to-device variations, time-dependent variability, and non-functional storage cells, while achieving a reasonable reconstruction performance of ~20 dB using only 0.1 devices/pixel for the analog mage. Finally, in an attempt to explicitly tackle device-device variation and drift, we use data from a commercial fabrication facility at TSMC and demonstrate preliminary results showing the ability to create an effective drift model capable of inferring values stored at previous times.