Vahabpour, Arash

Self-Organizing Generative Models for Diverse Imitation Learning

2022

Vahabpour, Arash
Advisor(s): Roychowdhury, Vwani

Abstract

The goal of AI is to allow computers to analyze and understand the world through algorithms. Generative models are a class of algorithms that can produce data with the same distribution as observed data are one of the most versatile tools towards creating such understanding. Amongst all possible models that could approximate any given data distribution, models that are constrained and hence forced to learn the essence of the way natural data is created are to be preferred. In this thesis, we propose a generative model that works based on the principle of sampling and searching over a low-dimensional latent representation for the data. We name it Self-Organizing Generative Model (SOG) due to its natural clustering of similar data points in their latent representation space, and its connection to the classical Self-Organizing Maps. We provide a theoretical analysis showing that this model adopts an Expectation-Maximization (EM) approach for maximizing the data likelihood. We demonstrate through various experiments that, unlike several existing models, the SOG model successfully learns all modes of data distribution. This property enables us to address imitation of diverse behaviors performed by a robot/agent in a Markov Division Process domain. More specifically, we focus on the task of imitation learning which aims at replicating expert policy from demonstrations, without access to any reward function that might have motivated the expert. This task becomes particularly challenging when the expert exhibits a mixture of behaviors. To model variations in the expert policy, prior work has considered adding latent variables to existing imitation learning frameworks. Our experiments show that the existing works do not exhibit appropriate imitation of all modes. To tackle this problem, we incorporate our generative model, SOG, into behavior cloning (BC). Behavior cloning is a supervised method to imitate expert behavior. The generative process of SOG adds diversity to BC. The resulting model, SOG-BC, shows promise to accurately distinguish and imitate different modes of behavior. Another model for imitation learning is known as GAIL, which adopts an adversarial approach to imitation. GAIL considers long-term dynamics in its optimization process. Therefore, it alleviates the problem of compounding errors inherent in BC and is more robust to noise. To make the imitated policy of SOG-BC robust towards compounding errors at unseen states, we integrate it with GAIL. We show that our method significantly outperforms the state of the art across multiple experiments.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Self-Organizing Generative Models for Diverse Imitation Learning