The purpose of this thesis is to present a novel method of learning to generate an image sequence from input a single image without sequentially aligned data. Given examples of a visual phenomena that can be divided into discrete time steps, the problem is to learn a model that takes an input from any such time and realizes this input at all other time steps in the sequence. For example, given a scenery picture in spring and output the corresponding pictures in sequence of summer, fall, and winter without changing overall layout and semantic information presented in the input picture. Furthermore, it is assumed that ground-truth aligned sequences is not provided. This broadens the real-world application of this method as it is often difficult to collect aligned sequential data for many problems. This task generalizes the unpaired image-to-image problem from generating pairs to generating sequences and associates a direction of time with the phenomena observed.
We show that this problem can be solved by incorporating Generative Adversarial Networks (GAN), a popular deep unsupervised learning technique, and a periodic assumption about the sequential visual phenomena modeled. The periodic assumption is enforced in model training by a novel Loop Consistency loss, inspired by the popular Cycle Consistency loss that has achieved huge success in unpaired image-to-image transformation. The two parts of the model, GAN and Loop Consistency, can be seen as two levels of constraints that together facilitate model training. The transformation unit itself is a neural network. We show the effects of different network architecture changes on generation quality and present the results of the model in comparison with several competitive baseline models.