High-resolution and real-time 3D sensing from cameras is a long standing challenge in the robotics and computer vision community. This thesis proposes a system wide optimization strategy for embedded vision systems by providing a software-hardware design strategy that jointly improves multi-camera arrays and reconstruction algorithms for improved 3D localization and sensing.
The proposed approach builds on state of the art research efforts of multi-camera systems and multi-view methods to simultaneously estimate system pose and 3D scene. Existing works are reviewed and summarized to provide a theoretical baseline of the field. To improve the robustness and efficiency of the estimation, geometric and epipolar principles of cameras are leveraged to formulate configuration decisions, on which the proposed algorithmic methods are based. These configuration implications are leveraged to formulate highly parallelizable and scalable reconstruction methods. The first goal of the estimation methods involves sparse 3D solutions for localization and map initialization that in turn serves as a prior for the dense reconstruction. A set of semi-dense and dense reconstructions algorithms are introduced that leverage the sensory configuration and ego-motion estimation to improve on state of the art stereo and partial light-field depth estimation methods, both in terms of estimation performance, and computational efficiency.
Real-world evaluation of the proposed system design is accomplished through the development of an open-source multi-camera sensing system, DevCAM. A heterogeneous FPGA, DSP, CPU and GPU architecture is used as the basis for an experimentation ground that accommodates 18 high resolution cameras in a trinocular panoramic configuration, an Inertial Nagivation System with differential GPS and high-performance networking capabilities. On this system, hardware optimized pre-processing is demonstrated, alongside early results for semi-dense 3D mapping.