Content creation aims to make the information (i.e., ideas, thoughts) accessible to the audience through mediums such as posters, movies, advertisements, or any type of art. However, it is a herculean task for people without years of professional experience to effectively convey the information. In recent years, visual content creation techniques based on generative adversarial networks (GANs) have made significant progress on facilitating the content creation process. By learning from the data, these approaches produce appealing content creation and editing results. Nevertheless, it remains challenging for the GAN-based methods to 1) provide a controllable creation/editing process for the user, and 2) learn from limited data for the content creation tasks. In this thesis, we investigate several solutions to address the above issues. We first propose a multi-modal image-to-image translation framework that learns from the unpaired data. We then leverage the image-to-image translation approaches to model the professional workflow of a particular content creation task. We also design a differentiable retrieval scheme to gain the controllability of the conditional image synthesis model. Finally, we propose a regularization term to learn the GAN models under the limited training data.
First of all, we propose an Image-to-Image translation framework that can 1) produce one-to-many translation results and 2) learn from unpaired data across various image domains. Specifically, the model disentangles images into a domain-invariant content space and a domain-specific attribute space to enable one-to-many translation. On the other hand, a cross-cycle consistency loss function is used to train the model with unpaired data. Furthermore, we design a mode seeking regularization term to increase the diversity of the generated images.
Second, we construct a system capable of modeling the progressive workflow for a particular type of artwork. By leveraging a series of image-to-image translation networks to learn the progressive creation steps, our model enables both multi-stage image generation and multi-stage editing of an existing piece of art. Moreover, we design an optimization process to address the reconstruction issue in the editing scenario.
Third, we develop a differentiable retrieval approach to select real image patches as the reference for the image synthesis task. With our differentiable retrieval approach, we can (1) make the entire pipeline (retrieval and generation) end-to-end trainable and (2) encourage the selection of mutually compatible patches for the same generated image. Furthermore, we design two auxiliary loss functions to facilitate the training of the retrieval function.
Finally, we propose a regularization term to train the GAN models on limited data. We theoretically and empirically show that our scheme 1) improves the generalization performance and stabilizes the learning dynamics of GAN models under limited training data, and 2) complements the recent data augmentation solutions.