Spectral imaging holds great promise for the non-invasive diagnosis of retinal diseases. However, to acquire a spectral datacube, conventional spectral cameras require extensive scanning, leading to a prolonged acquisition. Therefore, they are inapplicable to retinal imaging because of the rapid eye movement. To address this problem, we built a coded aperture snapshot spectral imaging fundus camera, which captures a large-sized spectral datacube in a single exposure. Moreover, to reconstruct a high-resolution image, we developed a robust deep unfolding algorithm using a state-of-the-art spectral transformer in the denoising network. We demonstrated the performance of the system through various experiments, including imaging standard targets, utilizing an eye phantom, and conducting in vivo imaging of the human retina.