Qi, Siyu

Task-Driven Adaptation of Deep Learning Architectures

2023

Qi, Siyu
Advisor(s): Ding, Zhi

Abstract

Deep learning (DL) is one of the widespread frameworks for solving problems from both thriving areas such as image recognition and long-standing areas such as salinity level estimation for water planning. Applying DL neural network models to establish rules from automatic data analysis and unsupervised feature extraction and generalize to unknown data, domain knowledge and human experience can be fused to adapt the existing models for specific DL tasks to obtain boosted performance and better interpretability. This dissertation aims to address some existing obstacles in practice with a focus on two themes: image compression and recognition in band-limited networks and water salinity modeling in Sacramento-San Joaquin Delta (Delta), California. When considering the deployment of learning-based image classifiers in distributed wireless Internet of Things (IoT) systems like remote camera deployment, effective feature extraction is critical for efficient bandwidth utilization. In the first part of this dissertation, we develop task-aware image compression codecs for edge nodes in the IoT systems.

Massive deployment of low cost IoT devices in various networked artificial intelligence must overcome limited computation and storage capacities of sensor terminals, thereby motivating studies on developing image codec to efficiently encode source images for transporting over bandwidth-constrained network links to cloud nodes responsible for complex computations. However, traditional standardized codecs such as JPEG were designed for human end users based on subjective tests, not for machine learning. Under limited storage and transport bandwidth, we aim to adapt the popular JPEG codecs for joint image compression and classification. Our novel end-to-end deep learning framework can optimize widely deployed JPEG codecs to improve classification accuracy over current JPEG settings. This integrative framework simplifies training and classification by directly leveraging the stored or received JPEG images in the frequency domain during learning to bypass the unnecessary step of image reconstruction.

On the other hand, neural-network-based image compression codecs, which usually provide a more promising performance, also play a critical role in remote camera applications. Yet, there exists several practical challenges in distributed DL over band-limited channels. Specifically, many IoT systems consist of sensor nodes for raw data collection and encoding, and servers for learning and inference tasks. Adaptation of DL over band-limited network data links has only been scantly addressed. The second challenge is the need for pre-deployed encoders being compatible with flexible decoders that can be upgraded or retrained. The third challenge is the robustness against erroneous training labels. Addressing these three challenges, we attach a side branch to the vanilla auto-encoder models and develop a hierarchical learning strategy to guide the encoder via this side path. Experimental results show that our hierarchically-trained models can improve link spectrum efficiency without performance loss, reduce storage and computational complexity, and achieve robustness against training label corruption.

Next, we identify another important challenge which is how to effectively train such distributed models when the training samples undergo some distortive transformations and the connecting channels have limited rate/capacity. Our goal is to optimize DL model such that the encoder latent requires low channel bandwidth while still delivers transform-invariant feature information for high classification accuracy. This work proposes a three-step joint learning strategy to guide encoders to extract features that are compact, discriminative, and amenable to common augmentations/transformations. We optimize latent dimension through an initial screening phase before end-to-end (E2E) training. To obtain an adjustable bit rate via a single pre-deployed encoder, we apply entropy-based quantization and/or manual truncation on the latent representations. The proposed trained models also exhibit robustness to such latent quantization and truncation.

In the second part of this dissertation, we turn to the DL applications where training data is insufficient. Reliability of the DL models usually comes with the pre-requisite of massive annotated training data. For example, the generalization capability of the DL models discussed above relies on tens of thousands of training samples. However, acquisition of task-specific annotated data can be costly in terms of experimental resource, human labor and user privacy, which calls for the few-shot learning (FSL) paradigm where models learns data representations effectively from limited number of samples. In addition, accurate label information may conflict with the intrinsic features in data, hence become misleading when training the embedding extractors. To alleviate model dependence on labeled data and address the common overfitting problem of FSL in computer vision, again, we integrate the side path in the encoder to ensure linear discriminative embeddings extraction. Moreover, to mitigate the disagreement between categorical labels from the classifier end and underlying patterns from the encoder side, we propose to incorporate coarse-grained instead of fine-grained labels into the embedding regularizer term. The proposed regularizer reduces overfitting and improves test accuracy over E2E CE training or its fine-grained version, especially for deeper models which are more likely to overfit. This regularizer works better when there is less manual intervention and more randomness in coarse label assignment, which in turn supports our statement that the inherent discriminative characteristics in data may not be well detected via the straightforward E2E label-based training.

In the third part of this dissertation, we shift to a conventional field of water salinity modeling. Domain-specific architectures of multi-layer perceptron (MLP) artificial neural networks (ANNs) have been developed as computer emulators for a commonly used process model, the Delta Simulation Model II (DSM2), for fast salinity level estimation at key monitoring stations in the Delta. However, achieving promising prediction results and fast inference speed at the same time can be challenging with an insufficient amount of training samples and/or the inevitable measurement noise in the observed dataset. To begin with, we propose three major enhancements to the existing ANN architecture for purposes of training time reduction, estimation error reduction and better feature extraction. Particularly, we design a novel multi-task ANN architecture with shared hidden layers for joint salinity estimation at multiple stations, achieving a reduction of 90% training and inference time. As another major structural redesign, we replace pre-determined pre-processing on input data by a trainable convolutional layer. We further enhance the multi-task ANN design and training for salinity forecasting. These enhancements substantially improve the efficiency and expand the capacity of the current salinity modeling ANNs in the Delta. Our enhanced ANN design methodologies have the potential for incorporation into the current modeling practice and provide more robust and timely information to guide water resource planning and management in the Delta.

The enhanced ANN is able to produce adequate estimation accuracy on DSM2 simulated data, but the performance degrades when being applied to field observations due to data insufficiency and noise in measurements. For further performance gain and inference acceleration, we develop novel DL models by attaching a residual shortcut path of recurrent neural network (RNN) layers to the vanilla MLP ANN architecture, called the "Res-RNNs". The proposed Res-RNNs can capture spatial variations with the main MLP path and handle temporal information with the assistance of the RNN side path, hence provides better performance than MLP models. Our work demonstrates the feasibility of DL-based models in supplementing the existing operational simulators in providing more accurate and real-time estimates of salinity to inform water management decision-making.

Overall, this dissertation reveals the potential of adapting existing DL model architectures for downstream tasks to achieve interpretable, robust and timely results in both the rising area of learning-based image recognition and the classical area of water modeling.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Davis

Task-Driven Adaptation of Deep Learning Architectures