Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Computational Analysis of Health Text

No data is associated with this publication.
Creative Commons 'BY-SA' version 4.0 license
Abstract

Health text ranging from patient-generated online forum posts to clinician-authored unstructured notes contain valuable information that can potentially improve healthcare service quality, patient experiences, and patient and population health outcomes. Health text data are also highly heterogeneous, produced in different contexts and serve different purposes, which require careful study design and methodological innovations to ensure study validity. However, the current practices of computational analysis on health text are often inconsistent and lack considerations of the contexts in which health text is produced.

My dissertation includes three major studies that analyzed different types of health text including public-generated social media data and clinical notes of patients with rare diseases. In the first study, I conducted a systematic literature review that revealed multiple issues in the current practices of how computational sentiment analysis is applied on health-related social media data. I also comprehensively evaluated the commonly used sentiment analysis tools on several social media datasets and found that they failed to accurately label the sentiments conveyed in health-related social media data. In the second study, I developed and applied computer-assisted qualitative analysis pipelines to analyze health-related social media data including tweets and online physician reviews. The results identified public attitudes and concerns toward mask wearing during the COVID-19 pandemic and patient concerns around healthcare service quality. These insights contribute to better public health communication strategies and ways of enhancing patients’ experiences when interacting with healthcare systems. In the third study, I switched gears to develop a pipeline that extracts various clinical entities including diagnosis, environmental exposures, substance use, performance status, and staging from unstructured notes of patients with lymphoid malignancies. The pipeline achieved satisfying performance and an error analysis identified issues with current documentation practices of key clinical information and provided recommendations for future improvement of the pipeline. The extracted clinical entities will be further used to facilitate clinical research to understand the association between environmental exposures and cancer outcomes.

Collectively, these studies contribute a set of methodological and empirical insights into how to design and choose an appropriate computational method to analyze different types of health text data. Moving forward, my future work will integrate and adapt the emerging Large Language Models into health text analysis, assess their performances, and identify potential biases when analyzing different types of health texts from various patient populations.

Main Content

This item is under embargo until August 21, 2024.