Ontology-Based Analysis of Online Healthcare Data
- Author(s): Wiley, Matthew Thomas
- Advisor(s): Hristidis (Christidis), Vagelis (Evangelos)
- et al.
The wide-spread adoption of electronic health records combined with the surge of online healthcare data have created unique data analysis challenges at the intersection of computing and healthcare. These challenges include extracting meaningful concepts from clinical notes and online social networks, as well as defining scalable algorithms and knowledge discovery techniques that utilize domain-specific knowledge representations, such as biomedical ontologies. Several ontologies have been built for the healthcare domains, which include information on diseases, procedures, drugs, and relationships between them.
As a first research contribution, we study how to efficiently find medical documents semantically similar to a given document. An application of this is finding patients similar to a current patient.
We define a novel algorithm for computing similarity between two sets of documents, where each document is a set of medical concepts represented by an ontology. We evaluate the scalability and performance of our methods using a real dataset of electronic health records.
Our second research contribution studies how predict medical concepts in a patient’s health record. For that, we then consider the sequence of notes in the current patient’s healthcare record, and use the records of similar patients to predict the current patient’s future diagnoses.
Our third contribution is the analysis of the relationship between a health online social network’s characteristics, such as moderation or anonymity, and its content – we focus on pharmaceutical drug discussions. The proposed techniques include novel methods for extracting and analyzing medical concepts from social media posts. We evaluate these techniques with several online social networks, and show how each type of online social network influences its pharmaceutical-related discussions.
Lastly, we propose a data-driven analysis to discover how the quality indicators of individual healthcare providers, such as peer awards, are associated with a rich set of attributes, such as years of experience, found in publicly available datasets. Our proposed analysis pipeline includes novel methods for mapping entities across multiple sources, building classifiers of provider quality, and identifying localized attributes of provider quality.