Robots have tremendous potential to help us in our daily lives. However, one key obstacle to facilitating their autonomy is that they lack the ability to perceive novel, or unseen objects. The de-facto solution to this problem is to pre-program robots using a large corpus of prior-known objects in hope that they will understand every object that they encounter. However, if robots need to understand new objects, they must be manually re-programmed to do so, which has proven to be time consuming and expensive, and is fundamentally intractable. Alternatively, a more direct approach to this problem is to leverage a robot's context, e.g., its immediate surroundings, which can be a rich source of information from which to learn about unseen objects in a scalable manner.
The goal of my research is to design algorithms and systems that enable robots to automatically discover unseen objects from their surroundings in a manner that is fast and robust to real-world vision challenges. In this dissertation, I discuss four key contributions of my work. First, I designed Salient Depth Partitioning (SDP), a novel depth-based region cropping algorithm which substantially improves the computation time of existing object detectors by up to 30%, with no discernible change in accuracy. SDP achieves real-time performance and is designed to give robots a better sense of visual attention, guiding them to visual regions that are likely to contain semantically important elements, which are also known as salient. Consequently, SDP can be used as a preprocessing algorithm to improve the computational efficiency of depth-based object detectors on mobile robots.
Second, I demonstrated that object proposal algorithms, a ubiquitous algorithmic component in machine vision systems, do not translate well to real-world contexts, which can negatively impact the performance of robots. I conducted a study to explore how algorithms are influenced by real-world robot vision challenges such as noise, blur, contrast, and brightness. I also investigated their performance on hardware with limited memory, CPU, and GPU resources to mimic constraints faced by mobile robots. To my knowledge, I am the first to investigate object proposal algorithms for robotics applications. My results suggest that object proposal algorithms are not generalizable to real-world challenges, in direct contrast to what is claimed in the computer vision literature. This work contributes to the field by demonstrating the need for better evaluation protocols and datasets, which will lead to more robust unseen object discovery for robots.
Third, I developed Unsupervised Foraging of Objects (UFO), a novel, unsupervised method that can automatically discover unseen salient objects. UFO is substantially faster than existing methods, robust to real-world noise (e.g., noise and blur), and achieves state-of-the-art performance. Unlike existing approaches, UFO leverages object proposals and a parallel discover-prediction paradigm. This allows UFO to quickly discover arbitrary, salient objects on a frame-by-frame basis, which can help robots to engage in scalable object learning. I compared UFO to two of the fastest and most accurate methods (at the time of writing) for unsupervised salient object discovery (Fast Segmentation and Saliency-Aware Geodesic), and found it to be 6.5 times faster, achieving state-of-the-art precision, recall, and accuracy. Furthermore, I show that UFO is robust to real-world perception challenges encountered by robots, including moving cameras and moving objects, motion blur, and occlusion. This work lays the foundation for faster online object discovery for robots which contributes toward future methods that will enable robots to learn about new objects via observation.
Fourth, I designed RaccooNet, a new real-time object proposal algorithm for robot perception. To my knowledge, RaccooNet is the current fastest object proposal algorithm at a runtime of 47.9 fps while also achieving comparable recall performance to the state-of-the-art (e.g., RPN, Guided Anchoring). Additionally, I introduced a novel intersection over union overlap confidence prediction module, which allows RaccooNet to recall more objects using a lesser number of object proposals, thus improving its efficiency. I also designed a faster variant, RaccooNet Mobile, which is over ten times faster than the state-of-the-art (171 fps). Conducting experiments on an embedded device, I demonstrated that my algorithm is suitable for computationally resource-constrained mobile robots. I validated RaccooNet and RaccooNet Mobile on three real-world robot vision datasets (e.g., RGBD-scenes, ARID, and ETH Bahnhof) and showed that they are robust to vision challenges, for example, blur, motion, lighting, object scale. This work contributes to the field by introducing a real-time object proposal algorithm, which will serve as a foundation to new real-time object discovery methods for mobile robots.
Summarizing my doctoral research, my work contributes to building real-time object perception systems that can be deployed on real-world robotic systems that operate in the wild. This work will ultimately lead to more scalable object perception frameworks that can learn directly from the environment, on-the-fly. Moreover, my research will allow roboticists to build smarter robots that will one day become more seamlessly integrated into our daily lives, and become the useful machines that we envisioned for our future.