Enabling Synthetic Data Usage for Medical Research
Acquiring data can be a major hurdle to any data science problem. Sometimes there isn’t enough data or, as is particularly the case for healthcare data, it may contain sensitive information such as personal identifiers that should not be shared. By generating synthetic health data, researchers aim to overcome obstacles of data access and privacy concerns and thereby allow for quicker and broader use of data by the research community. Through this thesis I have surveyed the current state of synthetic data usage in medical research, recorded the thoughts, experiences, and opinions of synthetic data use in medical research from interviewing medical researchers, selected synthetic data generation tools, assessed the accessibility, usability, and efficacy of the selected data generation tool with the help of two different use case groups, experimented with creative ways to use the chosen synthetic data tool, and used my experiences to write resources for current and future researchers who need assistance getting started with synthetic data generation through the UC Davis DataLab.