Skip to main content
Open Access Publications from the University of California

School of Information

School of Information bannerUC Berkeley


The School of Information creates knowledge and advances practice wherever people interact with information and technology.

Our research explores the implications for individuals and society as information and digital technologies are increasingly embedded in all aspects of human experience. Our professional master’s degrees prepare students to design and build the systems that will shape the way humans live and interact in the future.

Our research and teaching are interconnected; both are urgent, because our understanding of the consequences for individuals and society of their interactions with information and machines remains critical, contentious, and inadequate.

School of Information

There are 158 publications in this collection, published between 1972 and 2021.
Open Access Policy Deposits (110)

Decibel: The Relational Dataset Branching System.

As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs.

Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices.

We report the results of interviewing thirty professional data analysts working in a range of industrial, academic, and regulatory environments. This study focuses on participants' descriptions of exploratory activities and tool usage in these activities. Highlights of the findings include: distinctions between exploration as a precursor to more directed analysis versus truly open-ended exploration; confirmation that some analysts see "finding something interesting" as a valid goal of data exploration while others explicitly disavow this goal; conflicting views about the role of intelligent tools in data exploration; and pervasive use of visualization for exploration, but with only a subset using direct manipulation interfaces. These findings provide guidelines for future tool development, as well as a better understanding of the meaning of the term "data exploration" based on the words of practitioners "in the wild."

Privacy Decisionmaking in Administrative Agencies

Administrative agencies increasingly rely on technology to achieve substantive goals. Often this technology is employed to collect, exchange, manipulate and store personally identifiable information, raising serious concerns about the erosion of personal privacy.

Congress has recognized this problem. In the E-Government Act of 2002, it required administrative agencies to conduct privacy impact assessments (PIAs) when developing or procuring technology systems that handle personal information. Despite this new requirement, however, agency adherence to privacy mandates is highly inconsistent.

In this paper, we ask why. We first explore why both process requirements and traditional means of political oversight are often weak tools for ensuring that policy reflects privacy commitments. We then consider what factors might, by contrast, promote agency consideration of privacy concerns.

Specifically, we compare decisions by two federal agencies - the Department of State and the Department of Homeland Security - to use RFID technology, which allows a wireless-access data chip to be attached to or inserted into a product, animal, or person. These two cases suggest the importance of internal agency structure, culture, and personnel, as well as alternative forms of external oversight, interest group engagement, and professional expertise, as important mechanisms for ensuring bureaucratic accountability to the secondary privacy mandate imposed by Congress.

The analysis speaks to debates in both public administration and privacy protection. It implicates disputes over the efficacy of external controls on bureaucracy, and the less-developed literature on opening the black box of administrative decisionmaking. It further offers insight into pre-conditions necessary to advance privacy commitments in the face of social and bureaucratic pressure to manage risk by collecting information about individuals. Finally, it offers specific proposals for policy reform intended to promote agency accountability to privacy goals.

107 more worksshow all
Recent Work (52)

Location Management for Mobile Devices

Location-awareness, in the form of location information about clients and location-based services provided by servers, is becoming increasingly important for networked communications in general, and wireless and mobile devices in particular. The current fragmented landscape of location concepts and location-awareness, however, is not suitable for handling location information on a Web scale. Providing users with mechanisms which allow them to control how they want to expose their location information, and thus allow control over how to share location information with others and services, is a crucial step for better location management for mobile devices. This paper presents a concept for representing location vocabularies, matching and mapping them, how these vocabularies can be used to support better privacy for users of location-based services, and better location sharing between users and services. The concept is based on a language for describing place name vocabularies, which we call "Place Markup Language (PlaceML)", and on various ways how these vocabularies can be used in a location-aware infrastructure of networked devices.

Destination Services: Tourist media and networked places

Tourism exists in the interplay between places and stories. In making sense of travel, we are also making sense of ourselves and the world around us. Indeed, the global tourist industry produces places as “destinations” through stories and souvenirs. The audience for tourism stories has changed greatly with changes in technologies of communication and representation, with one of the most radical changes the introduction of networked media. With the rise of web-based services, tourist experiences have acquired a digital penumbra of content available in ever more formats and locations. This paper examines these technological changes, and the potential consequences for digital storytelling, travel, and the production of destinations.

Practical Obscurity in the Digital Age: Public Records in the Private Sector

In this paper, I outline the legislative framework governing information privacy practices in the public and private sectors in the United States and, more narrowly, the state of California, with particular attention paid to criminal justice system information. I will explore the relationship between the courts, which maintain public criminal records, and Corporate Data Brokers (CDBs), which aggregate and sell information from court records, as well as the accuracy and privacy of their systems. While legislation guiding the government's handling of information may need to be extended to the private sector, state governments have a role to play in improving their technology infrastructure to ensure that accurate, timely information is available in the public records. This is particularly important for the criminal justice system, the source of data brokers collecting. In making this argument, I look at one state, Colorado, that did a great deal early on to improve their criminal records technology infrastructure.

49 more worksshow all