With the recent success of deep learning methods, neural-based models have achieved superior performances and since dominated across natural language understanding and generation tasks. Due to the fact that many of such models are black-box mappings from the input to the output, it is increasingly important to understand how confident a model is about certain predictions and how robust the model is under distribution shift. Uncertainty estimation methods provide us a way to separately quantify epistemic and aleatoric uncertainty where the former arises due to inadequate knowledge about the model and the latter is the inherent irreducible uncertainty in data. We could then develop uncertainty-aware approaches that improve the robustness of a model. A closely related concept called calibration measures how aligned model confidence is with the prediction accuracy. A better-calibrated model is more robust because we could better interpret the predicted confidence scores from the model. Another important aspect concerning the robustness of a deep learning model is its ability to adapt to distribution shifts. When the test distribution differs significantly from the training distribution, the ability to detect and adjust accordingly is vital in practical applications.
In this dissertation, we first examine the benefits of applying uncertainty quantification methods to sentiment analysis, named entity recognition, and language modeling tasks. We show that by incorporating uncertainty estimation in the modeling process, we observe significant improvements in the three important NLP tasks. We then draw connections between hallucination and predictive uncertainty and empirically investigate their relationship in image captioning and data-to-text generation tasks. Next, we investigate the relationship between model calibration and label smoothing in document classification. We further acknowledge the importance of learning under distribution shift by introducing a benchmark that evaluates models on their abilities to estimate the change of label distributions in classification settings.Finally, we summarize the findings and discuss potential future research directions for uncertainty-aware learning and model robustness for NLP.
The methods and analyses in this dissertation allow for a better understanding of uncertainty and robustness in deep learning for natural language processing tasks. We envision a future where AI systems are explainable and accountable.