As technology has evolved, so too have our privacy needs. AI is now a household name and machine learning (ML) applications a part of daily life. But an ML model is only as good as the data on which it’s trained — and what happens when that data needs to be protected?
Differential privacy (DP) is a rigorous mathematical definition which can be used to provably bound the privacy leakage (or loss) of running a machine learning algorithm. This relatively young field of study has started to gain considerable traction in the ML research community. There is, however, a narrowing but still precipitous gap between theory and practice which has prevented DP from seeing widespread deployment in the real world. This dissertation proposes several tools and algorithms to bridge this gap.
In Chapter 2, we parameterize the privacy loss as a function of the data and investigate how to privately publish these data-dependent DP losses for the objective perturbation mechanism. These data-dependent DP losses might be significantly smaller than the worst-case DP bound, thus serving as justification for using a looser privacy guarantee --- hence achieving better utility --- in practice. Chapter 3 then demonstrates how data-dependent DP losses can be used in order to develop DP algorithms which can adapt to favorable properties of the data, in order to achieve a better privacy-utility trade-off. Chapter 4 returns to objective perturbation and provides this time-honored DP mechanism with new tools and privacy analyses that allow it to compete with more modern algorithms.