Managing Astronomy Research Data: Data Practices in the Sloan Digital Sky Survey and Large Synoptic Survey Telescope Projects
Ground-based astronomy sky surveys are massive, decades-long investments in scientific data collection. Stakeholders expect these datasets to retain scientific value well beyond the lifetime of the sky survey. However, the necessary investments in knowledge infrastructures for managing sky survey data are not yet in place to ensure the long-term management and exploitation of these scientific data. How are sky survey data perceived and managed, by whom, and what are the implications for the infrastructures necessary to sustain the long-term value of data? This dissertation used semi-structured interviews, document analysis, and ethnographic fieldwork to explain how perspectives on data management differ among the stakeholder populations of two major sky surveys: the Sloan Digital Sky Survey (SDSS) and the Large Synoptic Survey Telescope (LSST). Perspectives on sky survey data cluster into two categories: “data as a process” is where data are perceived in terms of the practices and contexts surrounding data production; and “data as a product” is where data are perceived as objective representations of reality, divorced from their production context. Analysis reveals these different perspectives result from stakeholders’ differing data management responsibilities throughout the research life cycle, as reflected through their professional role, career stage, and level of astronomy education. These results were used to construct a data management life cycle model for ground-based astronomy sky surveys. Stakeholders involved in day-to-day construction, operations, and processing activities perceive data as a process because they are intimately familiar with how the data are produced. In contrast, sky survey leaders perceive data as a product due to their roles as liaisons to external stakeholders. During the proposal stage, leaders must present the data as objective and accurate to secure financial support; during data release, leaders must attract researchers to trust the data for scientific use. The tendency of sky survey leaders to regard data as a product leads them, and other stakeholders, to undervalue workforces, funding, and the other knowledge infrastructures necessary to sustain the value of scientific data. Planning for long-term data management must include stakeholders who view data as a process as well as those who view data as a product.