Skip to main content
eScholarship
Open Access Publications from the University of California

Learning Information from Data while Preserving Differential Privacy

  • Author(s): Ji, Zhanglong
  • Advisor(s): Elkan, Charles
  • Ohno-Machado, Lucila
  • et al.
Abstract

In this dissertation, I am going to introduce my work on differentially private

data mining. There are usually two types of privacy preserving data mining algorithms:

query answering algorithms and data publishing algorithms.

We will present three algorithms on two differentially private query answering

problems in Chapter 3. The first problem is to return SNPs most correlated to a disease,

and we provide two efficient algorithms which make an accurate but previously inefficient

algorithm feasible. The second problem is to learn a model minimizing empirical

risk while selecting features. Our algorithm beats start-of-the-art algorithms.

Then we present three algorithms on differentially private data publishing under

two scenarios in Chapter 4. The first algorithm assumes existence of public data, and

assigns weights to the public data so that they are statistically similar to the private

data. Analysis on the weighted public data is more accurate than doing analysis on

the private data directly. The two following algorithms assume that published data are

for supervised learning (one algorithm for classification and the other for regression),

and that the prediction rules are continuous with respect to predictors. The data these

algorithms published perform very well on learning tasks.

Main Content
Current View