Attribute Representation in Neural Language Models
Neural models, including neural language models and encoder-decoder models, are the backbone of current natural language processing (NLP) research. Large pre-trained models have greatly improved the performance of both language understanding and generation in many NLP tasks. However, information encoded from the pre-trained models cannot translate to target space easily, which typically requires fine-tuning on domain-specific tasks. Due to domain shift between the pre-training data and the task data, it is still challenging for models to adapt to downstream tasks, especially when the training examples are limited such as in few-shot and zero-shot settings. More importantly, the fine-tuned models can only work well for a small number of domains because of diverging from the original pre-trained model, thus are prone to have over-fitting problems with inductive bias. Although scaled up models with billions or trillions of parameters have shown promising performance with prompts and examples, the challenges still remain.
In this thesis, we study if we can learn and inject attribute representation to pre-trained neural models to solve the challenges. Different from a black-box model where the parameters contain vast but encrypted world knowledge, the learned attribute representation can guide the model to learn information relevant to the target task, or serve as supplementary information aside from the original parameters. Attributes can be as high level as language representation in a multilingual transfer learning setting, or as low level such as span or ontology representations. This direction is appealing since we can introduce new attributes to pre-trained models without requiring any changes to the original trained model parameters. As we will show, learning attribute representation is efficient in training with both computation and data requirements. Moreover, it is easy to do transfer learning with even only few examples, while maintaining the original model quality. We believe that training attribute representation is a critical step to reduce the gap between neural model pre-training and applying to target tasks.
Specifically, we first introduce methods to represent high-level attributes. Those learned attributes can differentiate from other similar attributes so that they can be utilized to transfer useful knowledge across domains and further to control a neural model towards certain understanding and generation directions. Then, we discuss how to represent low-level attributes from pre-trained models. Those attributes can be hidden with pre-trained models and presented by latent representation. The representation can either be used directly for target tasks by identifying significant features, or be incorporated for further model training. Next, apart from more concrete attributes, we propose methods to integrate task specifications for efficient modeling. Those task-specific attributes model the target task directly, bridging model representation and prediction goals precisely, and enabling performance close to or even above human capacity. Lastly, we apply these attribute representations to dialog systems as a case study. We demonstrate how we can represent different aspects of attributes to build a dialog system from scratch smoothly. We present solutions to the most critical challenges in neural language models in general.