Real-Time Monocular Large-scale Multicore Visual Odometry /
- Author(s): Song, Shiyu;
- et al.
We present a real-time, accurate, large-scale monocular visual odometry system for real-world autonomous outdoor driving applications. This dissertation makes four important contributions: First, we demonstrate robust monocular Structure from Motion (SFM) with a series of architectural innovations including a novel epipolar searching module in a parallel thread to replenish 3D points through a series of validation mechanism, a novel combination of local and global bundle adjustment that ensures accuracy, robustness and efficiency and so on. These architectural innovations address the challenge of robust multithreading even for scenes with large motions and rapidly changing imagery. Our design is extensible for three or more parallel CPU threads. The novel epipolar searching module, operating in parallel with other threads to generate new 3D points at every frame, together with the local bundle adjustment in the primary thread, significantly boost robustness and accuracy. Secondly, we demonstrate robust monocular SFM with accuracy unmatched by prior state-of-the-art; over several kilometers, we achieve performance similar to stereo that far exceeds other monocular architectures. The key to this performance is scale drift correction using ground plane estimation that combines cues from sparse features and dense stereo. Our third contribution is a data-driven mechanism for cue combination that learns models from training data to relate observation covariances for each cue to error behavior of underlying variables. During testing, this allows per-frame adaptation of observation covariances based on relative confidences inferred from visual data. Finally, we present a framework for highly accurate 3D localization of objects like cars and a lane detection system based on our SFM poses and ground planes. Our accurate ground plane and SFM poses easily benefit 3D localization frameworks and vision-based lane detection for those applications, as demonstrated by our experiments. Our baseline SFM system is optimized to output pose within 50 ms in the worst case, while average case operation is over 30 fps. Evaluations are presented on the challenging KITTI dataset for autonomous driving, where we achieve better rotation and translation accuracy than other state-of-the-art systems