Skip to main content
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Text Understanding and Question Answering for Consumer Health Applications


The overarching problem that Natural Language Processing (NLP) research tries to solve is linguistic constructs and their meaning. There has been tremendous progress in recent years in contextual text representations, leading to the emergence of self-attention and language models. Language models have pushed the state of the art in question answering, natural language understanding, and a range of other NLP tasks. Whereas language models have revolutionized the way text is represented, they need large amounts of training data, and they may not understand text written in informal, non-mainstream styles. As a result, knowledge-hungry domains such as healthcare or underrepresented users such as older adults have not benefited from this progress.

This dissertation introduces methods that aim to enable language models to adapt to domains with user-generated text as input, and that require specialized knowledge, with challenging, noisy or small training datasets. In particular, I develop methods for text understanding and question answering for consumer health applications, or users of medical language technology systems. First, I tackle Answer Sentence Selection through recursive language models. I show that the popular transformer architecture can leverage tree structures in formally written text, yet fail to do so in informal, user-written text. Then, I propose to better understand user-written questions, or Consumer Health Questions: I propose a new parameter-sharing method that jointly trains question summarization and entailment for the medical domain. Afterwards, I bring together answer selection and question understanding to design a system for medical Question Understanding and Answering. The proposed system takes a long, user-written medical question as input, and selects the best answer from a medical knowledge base using self-supervised losses. Finally, I study text understanding through the lens of entity linking for utterances written by users on social media.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View