Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

End-to-End Machine Learning Frameworks for Medicine: Data Imputation, Model Interpretation and Synthetic Data Generation

Abstract

Tremendous successes in machine learning have been achieved in a variety of applications such as image classification and language translation via supervised learning frameworks. Recently, with the rapid increase of electronic health records (EHR), machine learning researchers got immense opportunities to adopt the successful supervised learning frameworks to diverse clinical applications. To properly employ machine learning frameworks for medicine, we need to handle the special properties of the EHR and clinical applications: (1) extensive missing data, (2) model interpretation, (3) privacy of the data. This dissertation addresses those specialties to construct end-to-end machine learning frameworks for clinical decision support.

We focus on the following three problems: (1) how to deal with incomplete data (data imputation), (2) how to explain the decisions of the trained model (model interpretation), (3) how to generate synthetic data for better sharing private clinical data (synthetic data generation). To appropriately handle those problems, we propose novel machine learning algorithms for both static and longitudinal settings. For data imputation, we propose modified Generative Adversarial Networks and Recurrent Neural Networks to accurately impute the missing values and return the complete data for applying state-of-the-art supervised learning models. For model interpretation, we utilize the actor-critic framework to estimate feature importance of the trained model's decision in an instance level. We expand this algorithm to active sensing framework that recommends which observations should we measure and when. For synthetic data generation, we extend well-known Generative Adversarial Network frameworks from static setting to longitudinal setting, and propose a novel differentially private synthetic data generation framework.

To demonstrate the utilities of the proposed models, we evaluate those models on various real-world medical datasets including cohorts in the intensive care units, wards, and primary care hospitals. We show that the proposed algorithms consistently outperform state-of-the-art for handling missing data, understanding the trained model, and generating private synthetic data that are critical for building end-to-end machine learning frameworks for medicine.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View