Basu, Chandrayee

Personalizing Autonomous Driving with Rich Human Guidance

2019

Basu, Chandrayee
Advisor(s): Singhal, Mukesh

Creative Commons 'BY-NC' version 4.0 license

Abstract

With progress in enabling autonomous cars to drive safely on the road, it is time to ask how should they be driving. This dissertation focuses on learning the desired objective function for autonomous cars with the goal of personalizing autonomous driving: drive following the passenger’s preferences across diverse environments. Traditionally autonomous cars have been trained using expert demonstrations, with an implicit assumption that the demonstrations are truly representative of optimal driving. Personalizing autonomous driving under this assumption would mean using Inverse Reinforcement Learning (IRL) to learn the objective function latent in the user’s own demonstration and then adopt the user’s own driving style. In this thesis, we question this assumption and propose algorithmic solutions for personalizing driving styles without demonstration data. Through user studies in a simulated driving environment, we first show that people do not want their autonomous cars to drive like them: they want a significantly more defensive car. Next we formalize driving preference as reward functions and propose several algorithms to learn them interactively from an alternative form of human guidance: Preference-based Learning. In Preference-based reward learning we show users several trajectory pairs sequentially and ask them to indicate their preference in each pair. This has been shown to be effective for learning reward functions in absence of demonstrations. Simple preference is, however, far less informative than all the demonstration data. The key contribution of this thesis is an algorithmic framework that leverages computational models of human behavior to enable learning from richer preference queries where response to each query contains more information than just a comparison. We propose different forms of rich preference queries. We ask people not only what they prefer, but also why they prefer. We design new queries to learn more complex reward functions that can potentially represent preferences in non-stationary environments. We introduce reward dynamics as a mixture of reward functions and parameters that govern how preferences change in response to the dynamics of the environment. We develop a unified formalism for treating all forms of human guidance as observations about the true preferences and use this formalism to derive objective functions for actively generating rich queries. We show empirically through simulations and also with user studies that richer preference queries can learn driving preference more accurately than comparison-alone queries. We also discover that richer queries not only speed up preference learning in practice but also offer more transparency into the decision-making algorithms of the autonomous car, thus enhancing people’s trust in the system. Although the human- robot system of choice in this thesis is autonomous car, our algorithmic solutions apply to personalizing other human-robot systems where the robot is a dynamical system that should match human preference and where demonstrations are unavailable due to complexity of robot operation or disparity between preferences and demonstrations.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Merced

Personalizing Autonomous Driving with Rich Human Guidance