Skip to main content
eScholarship
Open Access Publications from the University of California

The California Digital Library supports the assembly and creative use of the world's scholarship and knowledge for the University of California libraries and the communities they serve.

In addition, the CDL provides tools that support the construction of online information services for research, teaching, and learning, including services that enable the UC libraries to effectively share their materials and provide greater access to digital content.

Cover page of Cobweb: Collaborative Collection Development for Web Archives

Cobweb: Collaborative Collection Development for Web Archives

(2018)

A presentation to staff of the California Digital Library, providing an update on Cobweb development progress as of January, 2018.

Curation is Not a Place: Post-Custodial Stewardship for a Do-It-Yourself World

(2017)

Academic libraries operate in an increasingly crowded information space shared with many new public and private actors characterized by overlapping spheres of intention, capability, and responsibility.  In the areas of digital and data curation, library-hosted repository and preservation solutions are competing against alternatives with a lower barrier to entry, better user experience, and perception of functional sufficiency.  As these are also often free, libraries face increasing difficulty in retaining, let alone increasing, service adoption by their stakeholder communities.  One possible solution is suggested by questioning the often tacit assumption regarding the centrality of custodial stewardship.  What are the consequences of shifting curatorial imperatives away from *holding* a copy of a given information object to *knowing* where all the copies are?  This talk explores ideas for a post-custodial stewardship regime under which curatorial functions are applied in situ to radically-dispersed content.  In today’s do-it-yourself information environment, content will inevitably be manifest in a wide variety of venues.  While these individually may fall well short of embodying desirable levels of reliability and persistence, harnessing enough of them together within a unified post-custodial framework can nevertheless result in desirable global outcomes.  A post-custodial pattern of stewardship embraces, rather than futilely combats, the realities of today’s information ecosystem, filled with many well-funded commodity service providers.  Post-custodial curation has the potential to turn these (probably unassailable) competitors into (possibly unwitting) collaborators, and, through an appropriate division of labor, encourages libraries to direct their finite programmatic resources towards high-impact initiatives where they can uniquely add value.

  • 1 supplemental file

Securing the Future of Federal Research: Mirroring Data.gov as a Vital Scholarly Resource

(2017)

The recent transition of US presidential administrations has raised awareness and concern regarding the continuity of access to federal research data.  These data are part of the vital public record of federally-funded research, and their continued availability is critically important to scientific integrity and advancement, governmental accountability, and informed public policy.  The data.gov portal was created in 2009 as a central repository of government research data, and currently hosts over 135,000 datasets.  This information is, according to the 2013 federal open data policy, “a valuable national resource and a strategic asset to the Federal Government, its partners, and the public.”  As such, it is imperative that these data are subject to effective long-term stewardship.  Best practice within the preservation community calls for redundancy, at both a technical and organizational level, as a primary strategy for higher preservation assurance.  Consequently, California Digital Library (CDL) and Code for Science & Society (CSS) collaborated with the data.gov development team on datamirror.org, a full dynamic mirror of data.gov.  datamirror.org holds descriptive metadata and links to the dataset copies of record on federal agency websites, as well as alternative links to local datamirror-managed replicas (41 TB), and soon, to other known copies that may emerge through the efforts of the national data rescue movement, in which CDL and CSS are active participants.  While instigated by recent political events, the stewardship provided by datamirror.org is merely an expression of prudent research data management that is clearly called for to ensure permanent access to the nation’s rich digital patrimony.

  • 1 supplemental file
Cover page of Cobweb: Collaborative Collection Development for Web Archives

Cobweb: Collaborative Collection Development for Web Archives

(2017)

The massive scale of online news calls for collaborative approaches to assessing what has already been archived and asserting which un-archived sites should be preserved, considering their significance in representing a “first draft of history.” And given the fast pace at which online news content evolves, memory institutions interested in preserving this content may need to understand not just what has been archived, but also the temporal and structural conditions that were applied in its archiving. Researchers and journalists likewise have an interest in understanding what online news content is available to them, and are well-positioned to make recommendations about which content should be preserved, why, and at what pace, so that it can be made available into the future. There is therefore a potential for highly productive synergies between the news archiving community and the Cobweb initiative, which is building a collaborative collection development platform supporting the creation of comprehensive web archives by coordinating the independent activities of the web archiving community.

 

The demands of archiving the web in comprehensive breadth or thematic depth easily exceed the capacity of any single institution, and the same could be said of online news content. To ensure that the limited resources of a given archival program are deployed most effectively, it is important that its curators know something about the collection development priorities and holdings of other, similarly engaged institutions. Cobweb, https://github.com/CobwebOrg/cobweb, will meet this need by supporting three key functions of collaborative collection development of web archives: nominating, claiming, and holdings. The nomination function will let curators and stakeholders (including researchers and journalists) suggest web sites pertinent to specific thematic areas and provide seed-level descriptive metadata; the claiming function will allow archival programs to indicate an intention to capture some subset of nominated sites; and the holdings function will allow programs to document captured sites along with their collection-level description, structural and temporal scope, preservation policies, and terms of use. The aggregated  descriptions of websites archived by many distributed archival programs that will be made discoverable through Cobweb are also intended to make it easier for researchers and journalists to find versions of websites relevant to their work.

 

Cobweb is a collaborative project of the California Digital Library, Harvard University, and UCLA, funded by the Institute for Museum and Library Services. This presentation provides an update on recent project activities, which include finalizing data models and functional requirements, user experience/interface design, development of a prototype interface and database, and exploring interactions of the Cobweb system with other web archiving systems. 

Cover page of Collaborative Collection Development with Cobweb 

Collaborative Collection Development with Cobweb 

(2017)

The demands of archiving the web in comprehensive breadth or thematic depth easily exceed the capacity of any single institution.  To ensure that the limited resources of a given archival program are deployed most effectively, it is important that its curators know something about the collection development priorities and holdings of other, similarly-engaged institutions.  Cobweb, https://github.com/CobwebOrg/cobweb, will meet this need by supporting three key functions: nominating, claiming, and holdings.  The nomination function will let curators and stakeholders suggest web sites pertinent to specific thematic areas and provide seed-level descriptive metadata; the claiming function will allow archival programs to indicate an intention to capture some subset of nominated sites; and the holdings function will allow programs to document captured sites along with their collection-level description, structural and temporal scope, preservation policies, and terms of use.  Cobweb is a collaborative project of the CDL, Harvard University, and UCLA.  This presentation provides an update on project activities, as of July 2017.