Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Data Standardization, Federated Learning, and Informed Consent Algorithms and Tools to Honor Patient Privacy and Preferences in Clinical Research

Abstract

There is growing public awareness and concern about patient privacy and the potential risks of sharing clinical data for research. While investigators try to collect large amounts of data to increase the statistical power and diversity of the population, patients strive to gain more control of their data, in a way that respects their preferences; they want assurances about protecting their privacy. In this dissertation, I show how we can satisfy both researchers and patients through the novel use and development of data standardization, federated learning, and tiered informed consent algorithms and tools to foster clinical research while protecting patient privacy and preferences.Chapter 1 is an introduction that consists of research background, significance, problem statement, objectives, and thesis organization. Chapter 2 illustrates a distributed, federated network of 12 health systems that harmonized their electronic health records to a common data model to answer clinical questions related to COVID-19 and post these answers online in a privacy-preserving manner. This network is composed of horizontally partitioned data (i.e., complete data about a set of patients are located in different sites and these sites cannot share data at the individual level). Chapter 3 presents a new algorithm and implementation of distributed logistic regression model for vertically partitioned data (i.e., partial data about a patient are located in different participating sites and these sites cannot share those data at the individual level).Chapter 4 delineates how a source database of medical records can be transformed to a destination database following a common data model, under the constraint that an external expert team cannot access to individual level data. While these chapters describe how various institutions could collaborate without sharing individual level data, Chapter 5 explores whether it is feasible for patients to describe their sharing preferences so that a healthcare system can share data according to these preferences, and different ways to elicit these preferences. We found out that most patients are willing to share their data and biospecimens for research even if they were given the choice. Chapter 6 provides a conclusion and future directions for this work.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View