- Main
Scalable Binding
- Jabri, Allan Anwar
- Advisor(s): Efros, Alexei A
Abstract
Any useful agent will face many tasks and must rely on transfer of prior knowledge acquired in a scalable manner. This thesis explores inductive biases that enable scalable pre-training of representations -- and algorithms that bind them -- from the design of architectures capable of adaptive computation for scalable generative modeling, to self-supervised objectives that prepare embodied agents with mechanisms for state representation and reward maximization.
First, I consider the challenge of gracefully scaling generative models to high-dimensional data, motivating the importance of adaptive computation, a property missing from predominant architectures. This leads to a simple attention-based architecture for diffusion models capable of dedicating computation adaptively across its input and output, attaining superior performance in image and video generation despite being more domain-agnostic and efficient. Visualizations of read attention demonstrate how the model learns to dedicate computation to more complex parts of samples; e.g. in cases of high redundancy such as video prediction, it learns to simply copy information when appropriate and focus computation on more complex dynamics.
Next, I show how self-supervised objectives that exploit more domain knowledge can be used to efficiently solve related downstream tasks. In the domain of perception, I show how a simple self-supervised objective for space-time attention can be used to solve a range of tasks involving temporal correspondence and object permanence, central challenges in state representation for embodied agents. In the domain of reinforcement learning, I motivate the importance of scalable construction of task distributions and demonstrate how meta-reinforcement learners -- and underlying exploration and stimulus-reward binding mechanisms -- can be pre-trained with self-supervised reward models.
Finally, I conclude with a perspective on open problems in scalable pre-training, with a focus on the interplay between transfer across modalities, universal generative modeling objectives for discrete and continuous data, and adaptive computation.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-