- Main
3D Semantic Scene Understanding and Reconstruction
- Feng, Qiaojun
- Advisor(s): Atanasov, Nikolay
Abstract
Semantic understanding and reconstruction of the surrounding 3D environment is a necessary requirement for intelligent robots to autonomously fulfill various tasks like environment exploration, surveillance, autonomous driving, indoor household and healthcare service, to name a few. Although the progress on semantic understanding of 2D images is impressive, building a 3D consistent, meaningful yet compact and scalable semantic map for robotics applications is still challenging. In this dissertation, we develop 3D semantic map approaches for robots in different form, including the dense semantic map and the object-level semantic map. We propose TerrainMesh, a dense semantic map in the form of 3D mesh, for the terrain reconstruction from the aerial images and the sparse depth measurements. The joint 2D-3D and geometric-semantic learning framework is proposed to reconstruct the local semantic meshes and the global semantic mesh can be constructed by merging the local meshes with the help of the SLAM algorithm. We investigate the object-level semantic map constructed from 3D measurements. We propose CORSAIR, a retrieval and registration algorithm for point cloud objects. A 3D sparse convolution neural network model is trained to extract the global features for similar shape retrieval and the local per-point features for correspondence generation for pose registration with the help of symmetry. We develop ELLIPSDF, a bi-level object shape model including a coarse level of 3D ellipsoid and a fine level of 3D continuous signed distance function (SDF). We also design the approach to initialize and optimize the object pose together with the bi-level shape from the multiple depth image observations. We also propose the object-level semantic map from the 2D images and investigate its connections with the localization task. We introduce the object mesh model and its observation model of semantic instance segmentation and semantic keypoints. We derive the observation residual function and minimize it to optimize both the object states and the camera poses. We develop OrcVIO, object residual constrained visual-inertial odometry with object ellipsoid and semantic keypoints model. We implement the observation residual model between the ellipsoid and its 2D semantic bounding box and semantic keypoints and connect this with MSCKF framework to implement the online tightly coupled estimation of object and IMU-camera states. The object-level semantic map also provides a meaningful yet efficient representation of the environment. Finally we discuss the potential directions to further extend the 3D semantic understanding technique for robotics.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-