Robust Perception and Auto-teaching for Autonomous Robotic Systems
- Author(s): Wang, Zining
- Advisor(s): Tomizuka, Masayoshi
- et al.
Modern autonomous robotic systems are equipped with perception subsystems to handle unexpected failure cases and to navigate more intelligently in the unstructured environment. Robots navigate in a cluttered environment full of noise and disturbance. Robust perception extracts target objects from visual observations while rejecting all the noise and disturbances. It also focuses on increasing the redundancy by fusing information from multiple sensors. However, the vision sensor is mounted on the robot system and is affected by the model uncertainty of the robot. Therefore, auto-teaching is proposed to handle the modeling error of the robot by calibrating the model parameters while estimating states of the target object. On the other hand, robustly detecting target objects is the prerequisite for auto-teaching without human intervention, which requires studying perception and auto-teaching simultaneously. In addition, perception is also an essential part for perceiving the complex environment, where deep learning methods are becoming the mainstream recently. However, robustness of learning-based perception algorithms is not well explored.
In this dissertation, robust perception is discussed for robots carrying vision sensors and auto-teaching is developed for robots to recover from failures. Robustness of the perception subsystem is considered by developing global methods to reject disturbances and sensor fusion to improve redundancy. Several methods are proposed in both classic computer vision and deep learning areas with applications to two kinds of autonomous robotic systems, namely the industrial manipulator and the autonomous vehicle.
For the industrial manipulator discussed in Part I of this dissertation, the name hand-eye system is conventionally used referring to a robot arm holding vision sensors. Chapter 2 models the system and builds the motion block. The kinematic model is used for visual-inertial sensor fusion and generating the calibration parameters for auto-teaching. Planning and tracking control of the system are necessary for auto-teaching and ensuring the quality of visual data captured by the hand-eye system.
Industrial manipulators and their target objects have rich geometric information and accurate known shape, which is more suitable for classic computer vision (CV) methods. Chapter 3 and 4 constructs the robust perception block of the hand-eye system. Chapter 3 proposes several global shape matching methods for two kinds of visual inputs, namely image and point clouds. We globally search all potential matches of deformed target objects to avoid local optimals caused by disturbances. Chapter 4 introduces probabilistic inference to increase the robustness against noise when matching detected objects temporally. The proposed probabilistic hierarchical registration algorithm outperforms the deterministic feature descriptor-based algorithm used in state-of-the-art SLAM methods.
Visual detection is not robust against model uncertainty of the system and only gives 2D location of the object. Auto-teaching simultaneously calibrates the parameters of the systems while estimating the state of detected objects. Chapter 5 introduces the auto-teaching framework directly using the perception results from Chapter 3 and Chapter 4. Visual-inertial sensor fusion is used to increase the calibration accuracy by taking the robot motion measurement into account. Chapter 6 proposes an active auto-teaching framework which closes the calibration loop of the hand-eye system by planning optimal measurement poses using the updated parameters. Autonomous vehicles operate in a more versatile scenario where target objects are complex and unstructured. Deep learning-based methods have become the paradigm in this area in recent years, but robustness is the major concern for scaling up its application in the real world. In Part II of the dissertation, the robustness of learning-based detectors is discussed. Chapter 7 proposes two camera-LiDAR sensor fusion detection networks to increase the performance and redundancy of the detector. The proposed fusion layer is very efficient and back-propagatable which perfectly suits the learning framework. In Chapter 8, we further dive into the training and evaluation procedure of learning-based detectors. A probabilistic representation is proposed for labels in the dataset to handle the uncertainty of training data. A new evaluation metric is introduced for the proposed probabilistic representation to better measure the robustness of learning-based detectors.