A sequence of images is usually captured to observe the change of health status in medical diagnosis. However, an image sequence taken over year usually suffers from severe deformation, making it time-consuming for physicians to match corresponding patterns. In this paper, we propose a coarse-to-fine pipeline for retinal image registration based on convolutional neural network. By leveraging the three components of the pipeline: feature matching, outlier rejection, and local registration, we recover the deformation and accurately align multi-temporal image sequences. Experimental results show that the proposed network is robust to severe deformation as well as illumination and contrast variations. With the proposed registration pipeline, the change of image patterns over time can be identified through visual analysis.