Rapid urbanization over the past few decades has resulted in significant population growth, rural-to-urban migration, and extensive infrastructure development. This expansion necessitates advanced methodologies to effectively understand, interpret, and manage complex urban environments. Visual learning techniques are essential for achieving accurate 2D and 3D urban perception.
This dissertation explores robust visual learning techniques for 2D and 3D urban perception through three primary studies: (1) the development of a novel CNN-based approach for small object detection in urban settings, (2) the examination of a semi-supervised method for urban road extraction from satellite imagery, and (3) the implementation of a focal diffusion process for object-aware 3D LiDAR data generation.
The first study presents a CNN framework for detecting small urban elements, introducing the Reduced Downsampling Network (RD-Net) for high-resolution feature extraction and the Adjustable Sample Selection (ADSS) module to optimize training sample selection. The use of a generalized Intersection-over-Union (GIoU) loss function enhances bounding box regression accuracy. This framework demonstrates superior performance and robustness compared to traditional CNN-based models.
The second study addresses the challenge of limited labeled data for road extraction by proposing a semi-supervised learning approach. This method combines a small set of labeled data with a large corpus of unlabeled data, employing pixel-wise contrastive loss for self-supervised training. An iterative process generates pseudo-labels, filters high-quality labels, and retrains the network, significantly improving road extraction accuracy from satellite images and outperforming existing semi-supervised methods.
The third study explores a variant of the Denoising Diffusion Probabilistic Model for 3D LiDAR data generation. An object focal loss is employed to address challenges such as foreground sparsity and intra-class variance during training, resulting in LiDAR data with clearer and more distinct foreground features—crucial for urban planning and autonomous systems.
This research advances 2D and 3D urban perception by enhancing small object detection, improving road extraction accuracy, and facilitating high-fidelity LiDAR data generation. These advancements hold significant implications for urban planning, environmental monitoring, and intelligent transportation systems, contributing to the development of more sustainable urban environments. The findings highlight the potential of integrating deep learning techniques into urban environmental management, offering actionable insights for planners and policymakers.