The Impact of Synthetic and Real Training Data on Model Vulnerability
Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

The Impact of Synthetic and Real Training Data on Model Vulnerability

Abstract

Membership inference attacks can threaten the privacy of records in machine learning modelsby enabling adversaries to determine whether or not a record was used to train said model. In this paper we will be exploring the use of synthetic training data to defend against this form of attack. Synthetic data here keeps the attributes of the original training data set while maintaining machine learning utility. We use CTGAN and DP-CTGAN in order to generate high quality tabular synthetic training data. We evaluate the effectiveness of this approach empirically by comparing the vulnerability and utility of models trained with synthetic and real data. We also analyze the privacy-utility trade-off that comes with using synthetic data. Synthetic data seems to be a promising defense mechanism against membership inference attacks by providing increased privacy at reasonable utility losses.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View