On and Off-Policy Deep Imitation Learning for Robotics
- Author(s): Laskey, Michael David
- Advisor(s): Goldberg, Ken
- et al.
As an alternative to explicit programming for robots, Deep Imitation learning has two drawbacks: sample complexity and covariate shift. One approach to Imitation Learning is Behavior Cloning, in which a
robot observes a supervisor and then infers a control policy. A known problem with this approach is that even slight departures from the supervisor’s demonstrations can compound over the policy’s roll-out resulting in errors; this concept of drift and resulting error is commonly referred to as covariate shift On-policy techniques reduce covariate shift by iteratively collecting corrective actions for the current robot policy. To reduce sample complexity of these approaches, we propose a novel active learning algorithm, SHIV (Svm-based reduction in Human InterVention). While evaluating SHIV, we reconsider the trade-off between Off- and On-Policy methods and find that: 1) On-Policy methods are challenging for humans supervisors and 2) performance varies with the expressiveness of the policy class. To make Off-Policy methods more robust for expressive policies we propose a second algorithm, DART (Disturbances Augmenting Robot Trajectories), which injects optimized noise into the supervisor’s control stream to simulate error during data collection. This dissertation contributes two aforementioned algorithms, experimental evaluation with three robots evaluating their performance on tasks ranging from grasping in clutter to singulation to bed-making, and the design of a novel first-order urban driving simulator (FLUIDS) that can fill gaps in existing benchmarks for Imitation Learning to rapidly test algorithm performance in terms of generalization.