Current service robots perform flawless demos under close human supervision, but they often fail when working autonomously in long-term deployments. This dissertation studies the failures in long-term autonomous service robots and proposes methods to improve their reliability.
We built TritonBot, a receptionist and tour-guide robot, as a realistic example to discover the failure modes of a long-term autonomous service robot. TritonBot recognizes people's faces, talks to people, and shows people the labs and facilities in a university building. We deployed TritonBot for hundreds of hours to identify failure modes and common issues on service robots.
Following the experience from TritonBot, we designed two reliability engineering methods to improve the robustness of service robots. First, we found software encapsulation and dynamic orchestration streamline development workflows and avoid resource contention in service robots. Software encapsulation allows the developers to pack software into self-contained containers and simplify development workflows, and dynamic orchestration schedules the components on demand to avoid CPU/memory resource contention. We developed Rorg, a Linux container-based scheme to manage software components on service robots. Second, we found simulating a broad spectrum of rare failures at system level exposes design flaws and improves the robustness of service robots. Design errors in robotics are challenging to discover due to the need for extensive and resource-demanding testing. Broad-spectrum system-level failure injection exposes both software- and hardware-related design flaws and assists developers in reproducing rare failures, verifying the fixes, and testing the robustness of a robot system. We implemented RoboVac, an extensible and convenient fault injection framework that works at the system level and covers many failure patterns seen in long-term autonomous service robot deployments.
After working with TritonBot for two years and implementing reliability engineering methods, we concluded a few design principles for a long-term autonomous service robot at different levels in the system hierarchy. These design principles guide robust and reliable long-term autonomous service robot designs.
We use a set of automated tools, engineering methods, and design principles to build service robots that are available 24x7, and we call it "Reliability Engineering for Long-term Deployment of Autonomous Service Robots."