While subjective visual experiences are remarkably stable and coherent, their underlying data is incomplete and heavily influenced by the eyes' saccadic rhythm. In this work, we show that a deep and recurrent neural network can effectively reconstruct vibrant images from restricted retinal inputs during active vision. Our method includes the creation of a dataset for synthetic retinal inputs, containing intensity, color, and event-camera-generated motion data. We demonstrate the importance of both long-short-term memory and corollary discharge signals to image stabilization and the system's sensitivity to noise, corresponding to recent experimental findings. Our study contributes to the advancement of realistic and dynamic models for image reconstruction, providing insights into the complexities of active visual perception.