In a free society, people have the right to consume and share public data without fear of retribution. However, today's technological landscape enables large-scale monitoring and censorship of networks by powerful entities (e.g., totalitarian governments); at worst, these entities may punish people for the information they consume or the opinions they espouse. This thesis considers two problems aimed at empowering people to freely share and consume public information: anonymous message spreading and privacy-preserving database search. In both areas, we present algorithmic innovations and analyze their correctness and efficiency. The key assumption underlying our work is that centralized architectures cannot reliably provide privacy-preserving services; not only are the incentive structures misaligned, but centralized infrastructures are often vulnerable to external breaches by hackers or government agencies, for example. We therefore restrict ourselves to distributed algorithms that rely on cooperation and resource-sharing between privacy-conscious individuals.
In the area of anonymous message spreading, we consider a user who wishes to spread a message to as many people as possible over an underlying connectivity network (e.g., a social network); this is the premise of Yik Yak, Whisper, and other popular anonymous messaging networks. Most existing social networks (anonymous or not) use a push-based mechanism to spread content to all of a user's neighbors on the contact network; if a neighbor approves the content by `liking' it, this symmetric spreading propagates to the neighbor's neighbors, and so forth. Recent research suggests that under this spreading model, the true author of a message can be identified with non-negligible probability by a powerful global adversary. We propose an alternative, distributed spreading mechanism called adaptive diffusion, which breaks this symmetry. We show theoretically that adaptive diffusion gives optimal or asymptotically-optimal anonymity guarantees over certain classes of synthetic graphs for various adversarial models, while spreading nearly as fast as traditional symmetric mechanisms. On real-world graphs, we demonstrate empirically that adaptive diffusion gives significantly stronger anonymity properties than existing spreading mechanisms.
In the area of privacy-preserving search, we consider the foundations of a distributed, privacy-preserving search engine built over public data. Architecturally, we envision a peer-to-peer (P2P) network in which each user stores a small piece of a public database; when a user wishes to search for something, she obtains it by requesting the information from her peer nodes, which execute a distributed search over the relevant data index. Critically, this operation should be privacy-preserving---that is, no peer node should learn anything about the contents of the user's query. Distributed search engines are not a new idea, but making such a service privacy-preserving and robust is algorithmically challenging.
One challenge is that if the database is changing over time and the network is not centrally controlled, distributed users may not have a unified view of the database. That is, some peers may be storing an outdated portion of the global database, causing existing private search and retrieval algorithms to fail. We introduce a distributed private retrieval algorithm that is robust to servers with similar, but not identical, views of the database, and show that it incurs asymptotically negligible overhead compared to traditional algorithms. Another challenge is that most search engine users submit queries with multiple keywords, and expect the result to contain all of the queried keywords; this is known as a conjunctive query. Existing distributed algorithms return results that contain at least one of the queried keywords. This approach can incur significant communication overhead, as well as computational overhead for the client, who must sort through the results. We propose a new privacy-preserving search algorithm that processes conjunctive queries while incurring a communication cost that scales linearly in the number of documents that contain all the queried keywords. Our private-search algorithms build on principles from distributed source coding, which permit us to reduce the communication cost by exploiting correlations between the data of distributed peer nodes.