Skip to main content
Open Access Publications from the University of California

Recent Work

The UC Berkeley School of Information prepares leaders and pioneers solutions to the challenges of transforming information--ubiquitous, abundant, and evolving--into knowledge.

Through our Master's program, focused in five areas of concentration, we train students for careers as information professionals and entrepreneurs. Through our Ph.D. program and faculty research, we explore and develop solutions and shape policies that influence how people seek, use, and share information to create knowledge. Our work takes us wherever information touches lives, often bringing us into partnership with diverse disciplines, from law, sociology, and business to publishing, linguistics, and computer science.

Cover page of Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics

Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics


The creators of technical infrastructure are under social and legal pressure to comply with expectations that can be difficult to translate into computational and business logics. This dissertation bridges this gap through three projects that focus on privacy engineering, information security, and data economics, respectively. These projects culminate in a new formal method for evaluating the strategic and tactical value of data: data games. This method relies on a core theoretical contribution building on the work of Shannon, Dretske, Pearl, Koller, and Nissenbaum: a definition of situated information flow as causal flow in the context of other causal relations and strategic choices.

 The first project studies privacy engineering's use of Contextual Integrity theory (CI), which defines privacy as appropriate information flow according to norms specific to social contexts or spheres. Computer scientists using CI have innovated as they have implemented the theory and blended it with other traditions, such as context-aware computing. This survey examines computer science literature using Contextual Integrity and discovers, among other results, that technical and social platforms that span social contexts challenge CI's current commitment to normative social spheres. Sociotechnical situations can and do defy social expectations with cross-context clashes, and privacy engineering needs its normative theories to acknowledge and address this fact.  This concern inspires the second project, which addresses the problem of building computational systems that comply with data flow and security restrictions such as those required by law. Many privacy and data protection policies stipulate restrictions on the flow of information based on that information's original source. We formalize this concept of privacy as Origin Privacy. This formalization shows how information flow security can be represented using causal modeling. Causal modeling of information security leads to general theorems about the limits of privacy by design as well as a shared language for representing specific privacy concepts such as noninterference, differential privacy, and authorized disclosure.

 The third project uses the causal modeling of information flow to address gaps in current theory of data economics. Like CI, privacy economics has focused on individual economic contexts and so has been unable to comprehend an information economy that relies on the flow of information across contexts. Data games, an adaptation of Multi-Agent Influence Diagrams for mechanism design, are used to model the well known economic contexts of principal-agent contracts and price differentiation as well as new contexts such as personalized expert services and data reuse. This work reveals that information flows are not goods but rather strategic resources, and that trade in information therefore involves market externalities.

Cover page of Strange and Unstable Fabrication

Strange and Unstable Fabrication


In the 1950’s a group of artists led by experimental composer John Cage actively engaged chance as a means to limit their control over the artworks they produced. These artists described a world filled with active and lively forces, from the sounds of rain to blemishes in paper, that could be harnessed in creative production to give rise to new aesthetics and cultivate new sensitivities to the everyday. This approach to making was not simply act of creative expression but active attempt at creative expansion—a way of submitting to a world of creative forces beyond the self for the sake of seeing, hearing, or feeling things anew. I use these practices as a lens to reflect on the way human-computer interaction (HCI) researchers think about and design for making, specifically as it relates to the present day “maker movement.” I focus on how the design of digital fabrication systems, like 3D printers, could make room for creative forces beyond the maker and why such modes of making are worth considering in HCI research. Since digital fabrication technologies have catalyzed the maker movement and are often described as key instruments for “democratizing” manufacturing, this project joins broader efforts to reflect on values in maker technology as a means of expanding the design space of digital fabrication in ways that could potentially increase the diversity of participants associated with the movement.

By weaving through post-anthropocentric theories of the new materialisms, design practice, art history, and HCI, I contribute a theory of making that accounts for the creative capacity of nonhumans as well as design tactics to make room for nonhuman forces in the design of digital fabrication systems. I argue that nonhumans exert material-semiotic forces upon makers that shape their perspectives on stuff and culture in tandem. I then suggest that tools that are both strange and unstable create a space for makers to perceive and work with these forces in ways that honor the unique life and agency of nonhuman matter. As a whole, this work adds dimensionality to HCI’s existing focus on making as a process of self-expression by suggesting new design territories in fabrication design, crossings between critical reflection and creative production. I close this work by speculating on how tools that trade control, mastery, and predictability for chance, compromise, labor, and risk could become valuable within a broader landscape of making.

Cover page of Feed Subscription Management

Feed Subscription Management


An increasing number of data sources and services are made available on the Web, and in many cases these information sources are or easily could be made available as feeds. However, the more data sources and services are exposed through feed-based services, the more it becomes necessary to manage and be able to share those services, so that users and uses of those services can build on the foundation of an open and decentralized architecture. In this paper we present the Feed Subscription Management (FSM) architecture, which is a model for managing feed subscriptions and supports structured feed subscriptions. Based on FSM, it is easy to build services that manage feed-based services so that those feed-based services can easily create, change and delete feed subscriptions, and that it is easily possible to share feed subscriptions across users and/or devices. Our main reason for focusing on feeds is that we see feeds as a good foundation for an ecosystem of RESTful services, and thus our architectural approach revolves around the idea of modeling services as interactions with feeds.

Cover page of From RESTful Services to RDF: Connecting the Web and the Semantic Web

From RESTful Services to RDF: Connecting the Web and the Semantic Web


RESTful services on the Web expose information through retrievable resource representations that represent self-describing descriptions of resources, and through the way how these resources are interlinked through the hyperlinks that can be found in those representations. This basic design of RESTful services means that for extracting the most useful information from a service, it is necessary to understand a service's representations, which means both the semantics in terms of describing a resource, and also its semantics in terms of describing its linkage with other resources. Based on the Resource Linking Language (ReLL), this paper describes a framework for how RESTful services can be described, and how these descriptions can then be used to harvest information from these services. Building on this framework, a layered model of RESTful service semantics allows to represent a service's information in RDF/OWL. Because REST is based on the linkage between resources, the same model can be used for aggregating and interlinking multiple services for extracting RDF data from sets of RESTful services.

Cover page of Improving Federal Spending Transparency: Lessons Drawn from

Improving Federal Spending Transparency: Lessons Drawn from


Information about federal spending can affect national priorities and government processes, having impacts on society that few other data sources can rival. However, building effective open government and transparency mechanisms holds a host of technical, conceptual, and organizational challenges. To help guide development and deployment of future federal spending transparency systems, this paper explores the effectiveness of accountability measures deployed for the American Recovery and Reinvestment Act of 2009 ("Recovery Act" or "ARRA"). The Recovery Act provides an excellent case study to better understand the general requirements for designing and deploying "Open Government" systems. In this document, we show specific examples of how problems in data quality, service design, and systems architecture limit the effectiveness of ARRA's promised transparency. We also highlight organizational and incentive issues that impede transparency, and point to design processes as well as general architectural principles needed to better realize the goals advanced by open government advocates.

Cover page of Privacy Issues of the W3C Geolocation API

Privacy Issues of the W3C Geolocation API


The W3C's Geolocation API may rapidly standardize the transmission of location information on the Web, but, in dealing with such sensitive information, it also raises serious privacy concerns. We analyze the manner and extent to which the current W3C Geolocation API provides mechanisms to support privacy. We propose a privacy framework for the consideration of location information and use it to evaluate the W3C Geolocation API, both the specification and its use in the wild, and recommend some modifications to the API as a result of our analysis.

Cover page of KnowPrivacy



Online privacy and behavioral profiling are of growing concern among both consumers and government officials. In this report, we examine both the data handling practices of popular websites and the concerns of consumers in an effort to identify problematic practices. We analyze the policies of the 50 most visited websites to better understand disclosures about the types of data collected about users, how that information is used, and with whom it is shared. We also look at specific practices such as sharing information with affiliates and third-party tracking. To understand user concerns and knowledge of data collection we look at surveys and polls conducted by previous privacy researchers. We look at records of complaints and inquiries filed with privacy watchdog organizations such as the Federal Trade Commission, the Privacy Rights Clearinghouse, The California Office of Privacy Protection, and TRUSTe. Finally, to gain some insight into what aspects of data collection users are being made aware of, we look at news articles from three major newspapers for topics related to Internet privacy. Based on our findings we make recommendations for website operators, government regulators, as well as technology developers.

Cover page of Web Services for

Web Services for


One of the main goals of the Web site is to provide information about how funds for the American Recovery and Reinvestment Act (ARRA) of 2009 are allocated and spent. In this report, we propose a reporting architecture that would focus on the reporting services rather than the Web site and page design, and that uses these Web services to build the user-facing part of ARRA reporting. Our proposed architecture is based on simple and well-established Web technologies, and the main goal of this architecture is to provide citizens and watchdog groups simple and easy access to machine-readable data. Our architecture uses a more sophisticated framework than simple downloads of data files. Our proposed architecture is based on the principles of Representational State Transfer (REST) and uses established and widely supported Web technologies such as feeds and XML. We argue that such an architecture is easy to design and implement, easy to understand for users, and easy to work with for those who want to access ARRA reporting data in a machine-readable way.

Cover page of LODE: Linking Open Descriptions of Events

LODE: Linking Open Descriptions of Events


People conventionally refer to an action or occurrence taking place at a certain time at a specific location as an event. This notion is potentially useful for connecting individual facts recorded in the rapidly growing collection of linked data sets and for discovering more complex relationships between data. In this paper, we provide an overview and comparison of existing RDFS+OWL event models, looking at the different choices they make of how to represent events. We describe a recommended model for publishing records of events as Linked Data. We present tools for populating this model and a prototype of an "event directory" web service, which can be used to locate stable URIs for events that have occurred and to provide RDFS+OWL descriptions of them and links to related resources.

Cover page of Implementing Risk-Limiting Audits in California

Implementing Risk-Limiting Audits in California


Risk-limiting post-election audits limit the chance of certifying an electoral outcome if the outcome is not what a full hand count would show. Building on previous work, we report on pilot risk-limiting audits in four elections during 2008 in three California counties: one during the February 2008 Primary Election in Marin County and three during the November 2008 General Elections in Marin, Santa Cruz and Yolo Counties. We explain what makes an audit risk-limiting and how existing and proposed laws fall short. We discuss the differences among our four pilot audits. We identify challenges to practical, efficient risk-limiting audits and conclude that current approaches are too complex to be used routinely on a large scale. One important logistical bottleneck is the difficulty of exporting data from commercial election management systems in a format amenable to audit calculations. Finally, we propose a bare-bones risk-limiting audit that is less efficient than these pilot audits, but avoids many practical problems.