California Digital Library
Securing the Future of Federal Research: Mirroring Data.gov as a Vital Scholarly Resource
- Author(s): Abrams, Stephen
- et al.
The recent transition of US presidential administrations has raised awareness and concern regarding the continuity of access to federal research data. These data are part of the vital public record of federally-funded research, and their continued availability is critically important to scientific integrity and advancement, governmental accountability, and informed public policy. The data.gov portal was created in 2009 as a central repository of government research data, and currently hosts over 135,000 datasets. This information is, according to the 2013 federal open data policy, “a valuable national resource and a strategic asset to the Federal Government, its partners, and the public.” As such, it is imperative that these data are subject to effective long-term stewardship. Best practice within the preservation community calls for redundancy, at both a technical and organizational level, as a primary strategy for higher preservation assurance. Consequently, California Digital Library (CDL) and Code for Science & Society (CSS) collaborated with the data.gov development team on datamirror.org, a full dynamic mirror of data.gov. datamirror.org holds descriptive metadata and links to the dataset copies of record on federal agency websites, as well as alternative links to local datamirror-managed replicas (41 TB), and soon, to other known copies that may emerge through the efforts of the national data rescue movement, in which CDL and CSS are active participants. While instigated by recent political events, the stewardship provided by datamirror.org is merely an expression of prudent research data management that is clearly called for to ensure permanent access to the nation’s rich digital patrimony.