Open Data in Astronomy Sky Surveys
SummaryThis talk examines characteristics of openness in the collection, dissemination, and reuse of data in two astronomy sky survey case studies: the Sloan Digital Sky Survey (SDSS) and the Large Synoptic Survey Telescope (LSST). Discussion includes how the SDSS and LSST data, and datasets derived from the projects by end users, become available for reuse. Differences between these populations include the rate at which data are released, the populations to which the data are made open, the length of time data creators plan to make the data available, the scale at which these endeavors take place, and the stages of these two projects.AbstractModern sky surveys represent an era of open data in astronomy. While sky surveys are not new, and observational star catalogs have been generated for millennia, modern sky surveys are a qualitatively different kind of data collection. Distributed stakeholders collaborate to generate large amounts of uniform data, covering a large portion of the night sky. How those data are collected, processed, made available, and reused differs between astronomy projects and individual users of data.This talk examines characteristics of openness in two prominent optical sky surveys: the Sloan Digital Sky Survey (SDSS) and the Large Synoptic Survey Telescope (LSST). The first phase of the SDSS project (SDSS-I) ran from 2000 to 2005, the second (SDSS-II) from 2005 to 2008, and subsequent SDSS projects continue today. Preparation for the LSST started in the late 1990s, the survey is currently under construction, and data collection is expected to begin in 2020. We have studied the SDSS since 2009 and the LSST since 2011 with research methods that include document analysis, semi-structured interviews, and ethnographic participant observation.The SDSS dataset is significant in terms of scope, quality, public access, and variety of uses and users. The survey covered over a quarter of the night sky with high quality photometric and spectroscopic imaging. SDSS was the first ground-based astronomy survey to ensure prompt public release of data, with only a short proprietary period to clean the data and prepare them for release. Most science papers employing SDSS data are written by end-users unaffiliated with the official collaboration. Many collaborative telescope projects now emulate the open data practices pioneered by SDSS.The LSST plans to make a map of the sky every three to four days, resulting in an estimated 15 terabytes of data collected each night over the course of ten years of observations. While the SDSS had a short proprietary period for data processing, the LSST expects to release data immediately, without a proprietary period for the project’s investigators. However, funding requirements do constrain the policies and practices of data release. To ensure the approximately one billion-dollar project is fully funded, the LSST uses data access to entice institutions and countries to contribute resources to the project. While SDSS data were released over the web and made open internationally, LSST expects to offer different levels of data access, determined by each country’s partnership level. Given the scale of investments necessary for infrastructure, data collection, processing, maintenance, and sustaining access to the resulting datasets, creative funding models may be crucial for future projects the size of sky surveys.SDSS and LSST collaboration teams manage data as part of the projects. However, the goal of these surveys is not data per se, but to further scientific knowledge. Scientists are the end users of these data. Datasets retrieved from SDSS may be used alone or in combination with other datasets; similar models of science are expected for LSST. “Derived” datasets result from further processing by scientific users.While the SDSS and LSST projects were designed for data sharing and management, individuals and small groups of researchers rarely are able to provide long-term access to their derived datasets (Sands, 2016). SDSS data continue to be used for scientific investigations years after they were initially collected. These data also are used to design and calibrate the next generation of astronomy sky surveys, including the LSST.Open access to the SDSS data has been a critical part of ensuring the success of the project, serving practical and scientific goals, including the ability to secure further funding. The LSST plans to generate open data, but under a different policy framework for how, when, and to whom the data are accessible. While open sky survey data are often used in the course of scientific research, the resulting derived datasets are rarely made open. Concepts of open data vary widely between the SDSS team, the LSST team, and individual sky survey data users (Pasquetto, Sands, Darch, & Borgman, 2016).