Membership inference attacks can threaten the privacy of records in machine learning modelsby enabling adversaries to determine whether or not a record was used to train said model.
In this paper we will be exploring the use of synthetic training data to defend against this
form of attack. Synthetic data here keeps the attributes of the original training data set while
maintaining machine learning utility. We use CTGAN and DP-CTGAN in order to generate
high quality tabular synthetic training data. We evaluate the effectiveness of this approach
empirically by comparing the vulnerability and utility of models trained with synthetic and
real data. We also analyze the privacy-utility trade-off that comes with using synthetic data.
Synthetic data seems to be a promising defense mechanism against membership inference
attacks by providing increased privacy at reasonable utility losses.