Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Generative Models for Image and Long Video Synthesis

Abstract

In this thesis, I present essential ingredients for making image and video generative models useful for general visual content creation through three contributions. First, I will present research on long video generation. This work proposes a network architecture and training paradigm that enables learning long-term temporal patterns from videos, a key challenge to advancing video generation from short clips to longer-form coherent videos. Next, I will present research on generating images of scenes conditioned on human poses. This work showcases the ability of generative models to represent relationships between humans and their environments, and emphasizes the importance of learning from large and complex datasets of daily human activity. Lastly, I will present a method for teaching generative models to follow image editing instructions by combining the abilities of large language models and text-to-image models to create supervised training data. Following instructions is an important step that will allow generative models of visual data to become more helpful to people. Together these works advance the capabilities of generative models for synthesizing images and long videos.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View