Hierarchical Deep Reinforcement Learning For Robotics and Data Science

2018

Abstract

This dissertation explores learning important structural features of a Markov Decision

Process from offline data to significantly improve the sample-efficiency, stability, and robustness

of solutions even with high dimensional action spaces and long time horizons. It

presents applications to surgical robot control, data cleaning, and generating efficient execution

plans for relational queries. The dissertation contributes: (1) Sequential Windowed

Reinforcement Learning: a framework that approximates a long-horizon MDP with a sequence

of shorter term MDPs with smooth quadratic cost functions from a small number

of expert demonstrations, (2) Deep Discovery of Options: an algorithm that discovers hierarchical

structure in the action space from observed demonstrations, (3) AlphaClean: a

system that decomposes a data cleaning task into a set of independent search problems

and uses deep q-learning to share structure across the problems, and (4) Learning Query

Optimizer: a system that observes executions of a dynamic program for SQL query optimization

and learns a model to predict cost-to-go values to greatly speed up future search

problems.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley