Autonomous driving technologies have evolved rapidly recently. However, a key bottleneck is the reliance on high-definition (HD) maps. These maps provide abundant features such as lane lines and lane connectivities with centimeter-level accuracy. They are crucial to downstream tasks such as localizing the ego vehicle, predicting agent trajectories, and path planning. Nevertheless, it is costly to create and maintain HD maps, often involving time-consuming data collection and labor-intensive labeling. This presents a challenge for autonomous driving to scale up rapidly.
We investigate automating the offline map generation or creating maps online to replace HD maps. To generate robust offline maps, we use a probabilistic semantic grid to accumulate semantic labels from point clouds fused with semantic images. We incorporated the confusion matrix for the semantic segmentation model which significantly improved accuracy for the challenging lane boundary class.
While offline maps remove human annotation, it still require data collection to maintain up-to-date maps. Online mapping, which builds maps while driving, addresses this problem. Recent advances in low-cost vision-only online mapping increase the adoption. However, the maps built online have limited range due to heavy occlusion. We proposed using low-maintenance standard-definition (SD) maps, such as OpenStreetMaps (OSM), to enhance online mapping. We show that integrating SD maps via rasterized approach provides performance boosts and the graph-based approach can additionally yield a more lightweight network without sacrificing the performance.
As online mapping gains popularity, challenges arise when deploying the algorithm on a wide range of platforms. The performance of the model degrades significantly when the camera parameters and locations change. It is expensive to collect and annotate the data to retrain a new model for each set of sensor configurations. We propose to use sensor configuration agnostic semantic mapping to generate an intermediate semantic map to bridge the representation gap and show that it increases the zero-shot performance transfer significantly. Additionally, we created a pipeline to use Gaussian splitting to synthesize novel view data to recreate the dataset from a different sensor configuration. We show that this new dataset is effective for augmenting the data or pretraining.