Mavalankar, Aditi Ashutosh

Discovering Useful Behaviour in Reinforcement Learning

2022

Mavalankar, Aditi Ashutosh
Advisor(s): Saul, Lawrence

Abstract

Applying reinforcement learning techniques to real-world problems as well as long standing challenges has seen major successes in the last few years. In the reinforcement learning setting, an agent interacts with the environment, which gives it feedback in the form of a scalar reward signal. This reward signal may be available to the agent at every step, or it may be available after the agent has completed several subtasks in succession. Thus, it may be desirable for the agent to extract useful information from its interactions with the environment, that it can reuse to expedite learning.

In this dissertation, we will discuss three approaches for an agent to learn useful behaviour. Each of these three approaches will extract different kinds of useful information from the environment, and use it in a manner consistent with the reinforcement learning objective: maximizing the reward. The first approach discovers useful \emph{representations} of the observation space by exploiting symmetries in the agent and environment. The second approach discovers useful \emph{skills} that can be used in combination with each other to solve complex tasks. The third approach discovers useful \emph{states} that can serve as good starting points for the agent to explore in very large environments. We suggest that all of these ways of discovering useful behaviour will be crucial in the development of intelligent agents.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Discovering Useful Behaviour in Reinforcement Learning