Skip to main content
eScholarship
Open Access Publications from the University of California

UC Merced

UC Merced Electronic Theses and Dissertations bannerUC Merced

Immersive Scene Creation Without Boundaries

Creative Commons 'BY-NC' version 4.0 license
Abstract

Recent advancements in visual content creation enable many applications in virtual reality (VR), artistic asset creation, and movie production. As researchers, we naturally seek to explore the boundaries of existing technologies. Such boundaries could be anything, but I dedicate myself to recreating the beauty of scenery in the digital world. From the old-fashioned 2D image synthesis problem to recently popular neural rendering, my research aims to identify the boundaries set by data or training methods that constrain the existing frameworks from recreating the real world in virtual environments. In this thesis, I will introduce our exploration and findings along this line of research.

We start with the problem of image synthesis, which is one of the most attended research domains in computer vision. The problem not only provides an intuitive way to understand how neural networks understand our world, but it also supplies a wide variety of applications in content creation, from image/video editing, and 3D-shape construction, to synthesizing other modalities such as texts and voices. Besides these tremendous achievements, we observe that the modern problem setup is dedicated to object-centric datasets and applications. Such a shared limitation piques our interest in seeking the next generation of generative models that can synthesize unbounded scenes, in 2D or even 3D.

First, we focus on formally defined computer vision tasks, namely image outpainting, where we design a model or a framework to synthesize coherent image content based on a given image patch. Despite yielding some preliminary results, previous methods in image-to-image translation still cannot perform well in such a task, due to various failures in both the encoder and the decoder. These failures impair the final outpainting quality, which is far from the common image quality we observe in unconditional image synthesis. With these observations, we instead tackle the outpainting problem with an optimization-based variable search using a pretrained generative model. Such an approach avoids decoder problems as the decoder is untouched throughout the process, while the encoding error is mitigated with a much more accurate optimization process. We show that the proposed pipeline achieves state-of-the-art performance on image outpainting benchmarks.

Second, we propose a new generative modeling task for infinite-visual synthesis, which aims to train a generator to synthesize arbitrarily large images with a collection of finite-pixel images and limited computational cost. The proposed model, named InfinityGAN, can synthesize images up to infinite pixels with diverse content. By combining our aforementioned optimization-based image outpainting, InfinityGAN can instantly perform infinite-pixel outpainting and achieves another new state-of-the-art performance. In addition, we further demonstrate many additional applications, such as style fusion, image in-betweening, and inference speed-up with parallel inference.

Third, we further extend InfinityGAN to the 3D content synthesis problem. We built a new generative pipeline for unconstrainedly large 3D city scene synthesis, named InfiniCity. InfiniCity aims to create the global structure of the scene with infinite-pixel satellite image synthesis using InfinityGAN, then utilizes these global structure clues to render the realistic appearances of the street-view images/videos. We show that InfiniCity can synthesize arbitrary-scale, 3D-grounded, consistent, and realistic infinite-scale 3D city scenes, where none of the existing methods can achieve similar quality and performance.

Lastly, we discuss the problem of applying generative models in neural-rendering-based scene reconstruction. In particular, we tackle a radiance inpainting problem where generative models are used to hallucinate the appearance of invisible regions. We show that modern image-based inpainting generative models present unavoidable shortcomings when handling 3D tasks, and propose several techniques to enhance the 3D consistency. The combination of our proposed methods leads to state-of-the-art performance on the 3D inpainting problem with substantial performance improvements on textural consistency and realism.