Robot Imitation by Action Understanding, Mirroring, and Interactions
- Author(s): Liu, Hangxin
- Advisor(s): Zhu, Song-Chun
- et al.
This dissertation rethinks the problem of robot imitative learning given human demonstrations and proposes a holistic framework to unify three challenges: (i) understanding the actions being imitated, (ii) producing proper imitative behaviors, and (iii) interacting effectively with humans. Considering a complex manipulation task of opening medicine bottles, some key actions of pushing or squeezing are critical in unlocking the bottles, but can hardly be recognized from pure visual observations. Therefore, a glove-based system is firstly presented together with a demonstration collection pipeline to understand the actions from a functional perspective that incorporates hand movements and goal states and from a physical perspective that focuses on the forces to reach the states. This heterogeneous information is integrated by a Temporal And-Or Graph (T-AOG) grammar representation, which also captures the hierarchical structure of the task. Sampling from the T-AOG generates a valid action sequence to accomplish the task. To transfer this skill to a robot, a mirroring approach is then proposed for a robot to infer functionally equivalent actions that can produce a similar force pattern in a physics-based simulation and achieve the same goal in changing object states, naturally bridging the action perception and production in robot imitation. In addition, using such a grammar representation is advantageous in tracking object states and accumulating robot knowledge from multiple views over a long period of time. Based on these, a joint inference algorithm is proposed to infer human (false-)beliefs and overcome the ambiguity in visual detections. Finally, this dissertation studies how different forms of explanation generated from the representation prompt human trusts in a robotic system and develops an Augmented Reality (AR) interface that allows users to interactively supervise a robot’s decision-making process and intervene by patching its knowledge represented by a T-AOG. Having a T-AOG representation as the core, this dissertation seeks to unify the perception, learning, planning, and interaction problems in robot imitation.