Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Private Graph Statistics and Algorithms in Modern Applications

Abstract

In many modern machine learning applications, users hold data on a local device and communicate with an untrusted central analyst. Local differential privacy (local DP) is the state-of-the art way to ensure that all data sent by a user will be protected regardless of who eventually sees the data. In this dissertation, I will focus on collecting certain graph statistics, such as the degree vector and number of triangles in the graph, under local DP. I will obtain formal performance guarantees in the presence of different real-world constraints.

First, I will detail a one-round triangle counting algorithm which is simple, but not guaranteed to work well on sparse graphs. To improve these matters, I will propose a two-round algorithm with improved performance for sparse graphs. Unfortunately, it may have prohibitively large per-user download cost. Second, I will show how one can use edge sampling to reduce download cost, and exhibit another two-round algorithm which trades off between the download cost and error.

Third, given that it requires significant synchronization overhead to implement multiple rounds, I will focus on improved one-round algorithms for triangle counting. I propose such an algorithm which satisfies a slightly weaker notion of DP, the shuffled model. This algorithm uses a novel wedge-shuffling approach to count wedges, and gives rise to an accurate algorithm for counting 4-cycles as well. Fourth, I will shift gears to computing the degree vector in the presence of malicious users who do not follow the local DP protocol. I will detail robust algorithms, leveraging the fact that every edge in a graph is shared between two users, which use edge consistency checks to catch malicious users.

Fifth, I will highlight my work on private hierarchical clustering, a common clustering algorithm used for graphs and other data. I propose two algorithms and a lower bound in the general setting of the problem, and a near-optimal algorithm in the stochastic block model of graphs. The algorithm design and lower bound are of interest to designing future local DP protocols on graphs.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View