Towards Practical Privacy-Preserving Data Analytics
- Author(s): Johnson, Noah Michael
- Advisor(s): Song, Dawn
- et al.
Organizations are increasingly collecting sensitive information about individuals. Extracting value from this data requires providing analysts with flexible access, typically in the form of databases that support SQL queries. Unfortunately, allowing access to data has been a major cause of privacy breaches.
Traditional approaches for data security cannot protect privacy of individuals while providing flexible access for analytics. This presents a difficult trade-off. Overly restrictive policies result in underutilization and data siloing, while insufficient restrictions can lead to privacy breaches and data leaks.
Differential privacy is widely recognized by experts as the most rigorous theoretical solution to this problem. Differential privacy provides a formal guarantee of privacy for individuals while allowing general statistical analysis of the data. Despite extensive academic research, differential privacy has not been widely adopted in practice. Additional work is needed to address performance and usability issues that arise when applying differential privacy to real-world environments.
In this dissertation we develop empirical and theoretical advances towards practical differential privacy. We conduct a study using 8.1 million real-world queries to determine the requirements for practical differential privacy, and identify limitations of previous approaches in light of these requirements. We then propose a novel method for differential privacy that addresses key limitations of previous approaches.
We present Chorus, an open-source system that automatically enforces differential privacy for statistical SQL queries. Chorus is the first system for differential privacy that is compatible with real databases, supports queries expressed in standard SQL, and integrates easily into existing data environments.
Our evaluation demonstrates that Chorus supports 93.9% of real-world statistical queries, integrates with production databases without modifications to the database, and scales to hundreds of millions of records. Chorus is currently deployed at a large technology company for internal analytics and GDPR compliance. In this capacity, Chorus processes more than 10,000 queries per day.