Skip to main content
eScholarship
Open Access Publications from the University of California

These are the publications for The Center for Knowledge Infrastructures on eScholarship. We conduct research on scientific data practices and policy, scholarly communication, and socio-technical systems. 

Cover page of Library Cultures of Data Curation: Adventures in Astronomy

Library Cultures of Data Curation: Adventures in Astronomy

(2020)

University libraries are partnering with disciplinary data producers to provide long-term digital curation of research datasets. Managing dataset producer expectations and guiding future development of library services requires understanding the decisions libraries make about curatorial activities, why they make these decisions, and the effects on future data reuse. We present a study, comprising interviews (n=43) and ethnographic observation, of two university libraries who partnered with the Sloan Digital Sky Survey (SDSS) collaboration to curate a significant astronomy dataset. The two libraries made different choices of the materials to curate and associated services, which resulted in different reuse possibilities. Each of the libraries offered partial solutions to the SDSS leaders’ objectives. The libraries’ approaches to curation diverged due to contextual factors, notably the extant infrastructure at their disposal (including technical infrastructure, staff expertise, values and internal culture, and organizational structure). The Data Transfer Process case offers lessons in understanding how libraries choose curation paths and how these choices influence possibilities for data reuse. Outcomes may not match data producers’ initial expectations but may create opportunities for reusing data in unexpected and beneficial ways.

Cover page of Our knowledge of knowledge infrastructures: Lessons learned and future directions

Our knowledge of knowledge infrastructures: Lessons learned and future directions

(2020)

The Knowledge Infrastructures Workshop conducted at UCLA in February 2020, and funded by the Alfred P. Sloan Foundation, revisited the goals and findings of the 2012 workshop held at the University of Michigan. Thirty scholars, from a diverse array of disciplines and backgrounds, charted a course for the next decade of KI research. Such infrastructures are increasingly fragile, and often brittle, in the face of open data and open source, the demise of gatekeepers, and shifting public and private boundaries that redistribute power. Participants identified new methods and new opportunities for studying KI. Among the many scholarly products they proposed are publications, grant proposals, conference sessions, and workshops on the role of libraries in data services, the death and afterlives of KI, misinformation and disinformation in KI, KI in the Anthropocene, “N simplish rules” to grow and sustain KI, university capacities for KI, designing sustainable KI, and inclusion of underrepresented groups in the design of KI. The report, position papers, and other materials will be maintained at the KI workshop site, http://knowledgeinfrastructures.org.

Cover page of Collaborative Ethnography at Scale: Reflections on 20 years of data integration

Collaborative Ethnography at Scale: Reflections on 20 years of data integration

(2020)

A 5-year STS project in geography, starting in 1999, evolved into 20 years of data collection about scientific data practices in sensor networks, environmental sciences, biology, seismology, undersea science, biomedicine, astronomy, and other fields. By emulating the ‘team science’ approaches of the scientists studied, the UCLA Center for Knowledge Infrastructures accumulated a comprehensive collection of qualitative data about how scientists generate, manage, use, and reuse data across domains. Building upon Paul N. Edwards’s model of ‘making global data’ – collecting signals via consistent methods, technologies, and policies – to ‘make data global’ – comparing and integrating those data, the research team has managed and exploited these data as a collaborative resource. This article reflects on the social, technical, organizational, economic, and policy challenges the team has encountered in creating new knowledge from data old and new. We reflect on continuity over generations of students and staff, transitions between grants, transfer of legacy data between software tools, research methods, and the role of professional data managers in the social sciences.

Cover page of Whose text, whose mining, and to whose benefit?

Whose text, whose mining, and to whose benefit?

(2019)

Scholarly content has become more difficult to find as information retrieval has devolved from bespoke systems that exploit disciplinary ontologies to keyword search on generic search engines. In parallel, more scholarly content is available through open access mechanisms. These trends have failed to converge in ways that would facilitate text data mining, both for information retrieval and as a research method for the quantitative social sciences. Scholarly content has become open to read without becoming open to mine, due both to constraints by publishers and to lack of attention in scholarly communication. The quantity of available text has grown faster than has the quality. Academic dossier systems are among the means to acquire more quality data for mining. Universities, publishers, and private enterprise may be able to mine these data for strategic purposes, however. On the positive front, changes in copyright may allow more data mining. Privacy, intellectual freedom, and access to knowledge are at stake. The next frontier of activism in open access scholarship is control over content for mining as a means to democratize knowledge.