Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Adversarial Privacy Auditing of Synthetically Generated Data produced by Large Language Models using the TAPAS Toolbox

Abstract

In today’s world with ever increasing need for data collection, there is a rise in demand for privacy-preserving synthetic data generation and privacy auditing techniques to safeguard sensitive user information and data from privacy attacks. This paper explores the adversarial privacy auditing of synthetically generated data produced by Large Language Models (LLMs) using the TAPAS “Toolbox for Adversarial Privacy Auditing of Synthetic Data” framework. This paper uses a healthcare dataset with sensitive user information of Breast Cancer to evaluate the privacy of the data using adversarial techniques. The paper compares and contrasts the data quality, data distributions and privacy-preserving metrics of the real dataset with synthetically generated datasets from several sources including LLMs such as the GReaT framework and OpenAI's GPT4, Generative Adversarial Networks (GANs), and an AI-generated dataset produced using a proprietary technique from an industry startup, mostly.ai.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View