Learning Information from Data while Preserving Differential Privacy
In this dissertation, I am going to introduce my work on differentially private
data mining. There are usually two types of privacy preserving data mining algorithms:
query answering algorithms and data publishing algorithms.
We will present three algorithms on two differentially private query answering
problems in Chapter 3. The first problem is to return SNPs most correlated to a disease,
and we provide two efficient algorithms which make an accurate but previously inefficient
algorithm feasible. The second problem is to learn a model minimizing empirical
risk while selecting features. Our algorithm beats start-of-the-art algorithms.
Then we present three algorithms on differentially private data publishing under
two scenarios in Chapter 4. The first algorithm assumes existence of public data, and
assigns weights to the public data so that they are statistically similar to the private
data. Analysis on the weighted public data is more accurate than doing analysis on
the private data directly. The two following algorithms assume that published data are
for supervised learning (one algorithm for classification and the other for regression),
and that the prediction rules are continuous with respect to predictors. The data these
algorithms published perform very well on learning tasks.