- Main
Towards Robust and Generalizable Machine Learning Models
- Wang, Yihan
- Advisor(s): Hsieh, Cho-Jui
Abstract
With the rapid advancement and widespread adoption of machine learning technologies, concerns about model robustness and generalization have become increasingly significant. This dissertation discusses and addresses critical challenges in developing machine learning models that are both robust against adversarial examples and capable of generalizing across distribution shifts. We first investigate certified robustness methods for neural networks, proposing Fast-IBP, a novel approach that achieves state-of-the-art certified robust accuracy with significantly reduced training time through improved initialization and normalization techniques. We also provide theoretical analysis on the limitations of width scaling in Interval Bound Propagation training.Moving beyond traditional robustness concerns, we explore challenges in responsible deployment of large language models (LLMs), developing comprehensive red-teaming tests for popular LLM text detection methods and proposing a training-free backtranslation defense against jailbreaking attacks. Finally, we extend our discussions from robustness in the deployment stage to the training stage. We provide theoretical analysis towards the limitations of prompt-tuning, identifying its representation power and failure cases. We also identify format overfitting as a partial explanation for language model overfitting when fine-tuned on specific downstream tasks, and introduce PROMOT, a two-stage fine-tuning strategy that mitigates over-specialization while maintaining or improving in-context learning performance on unfinetuned tasks. Moving from single-task fine-tuning to general instruction fine-tuning, we identify the loss of context awareness after instruction finetuning for language models. Based on empirical observations, we propose a method to identify context-dependent training examples and mitigate the performance loss. Our work contributes significant advancements in building machine learning systems that are not only powerful but also reliable, safe, and aligned with human values.