Robotics has seen increasing success in automating a wide variety of tasks in structured settings, such as factories and assembly lines. However, using robots for reliable automation in less structured, open-world environments remains a critical challenge. Recent advances in machine learning have played a critical role in the development of agents which can bridge this gap by exploring the environment online and leveraging collected experiences to modify their behavior. For these algorithms to be broadly applicable in practice, they must be both efficient enough to direct exploration towards regions that are relevant for learning new tasks while being safe and reliable enough to deploy in the physical world. However, this is exceedingly challenging if these algorithms must discover information about promising and unpromising behaviors entirely from their own online experience, and are not provided with any additional supervision indicating promising behaviors to emulate and unpromising behaviors to avoid.
This dissertation introduces a number of online learning algorithms for learning from demonstrations, reinforcement learning, and bandit exploration which use scalable sources of super- vision, both from sparing human queries and offline datasets, to structure online exploration to facilitate safe and efficient robot learning. I will discuss both theoretical properties of these algorithms and their evaluation in a number of environments with uncertain dynamics both in simulation and on robotic manipulation tasks in the physical world. I will end by talking about opportunities for future work in improving the scalability and applicability of online learning algorithms for robot learning in practice.