In this work, we focus on the problem of pose estimation in unknown environments, using the measurements from an inertial measurement unit (IMU) and a single camera. We term this estimation task visual-inertial odometry (VIO), in analogy to the well-known visual-odometry (VO) problem. Our focus is on developing VIO algorithms for high-precision, consistent motion estimation in real time. The majority of VIO algorithms proposed to date have been developed for systems which are equipped with high-end processors and high-quality sensors. By contrast, we are interested in tracking the motion of systems that are small and inexpensive, and are equipped with limited processing and sensing resources.
Such resource-constrained systems are common in application areas such as micro aerial vehicles, mobile phones, and augmented reality (AR) devices. Endowing such systems with the capability to accurately track their poses will create a host of new opportunities for commercial applications, and lower the barrier to entry in robotics research and development.
Performing accurate motion estimation on resource-constrained systems requires novel methodologies to address the challenges caused by the limited sensing and processing capabilities, and to provide guarantees for the optimal utilization of these resources. To this end, in this work, we focus on developing novel, resource-
adaptive VIO algorithms based on the extended Kalman filter (EKF) formulation. Specifically, we (i) analyze the properties and performance of existing EKF-based
VIO approaches, and propose a novel estimator design method, which ensures the correct observability properties of the linearized system models to improve the estimates' accuracy and consistency, (ii) present a methodology for minimizing the computational cost of the EKF-VIO algorithms, which relies on online optimization of the estimator's parameters based on the properties of the environment, (iii) propose an algorithm for joint online calibration of the spatial and temporal relationship between the visual and inertial sensors, and (iv) propose high-fidelity sensor models that enable us to process the measurements captured by rolling-shutter cameras and low-cost inertial sensors. We evaluate our estimators with various simulated and real-world data sets, which demonstrate that our proposed formulations are able to consistently and accurately track the pose of resource-constrained systems in real time.