From Open Data to Knowledge Production: Biomedical Data Sharing and Unpredictable Data Reuses
- Author(s): PASQUETTO, IRENE V.
- Advisor(s): Borgman, Christine L.
- et al.
Using a US consortium for data sharing as the primary field site, this three-year ethnographic research project examines the socio-technical, epistemic, and ethical challenges of making biomedical research data openly available and reusable. Public policy arguments for releasing scientific data for reuse by others include increasing trust in science and leveraging public investments in research. In most types of scientific research, data release occurs in parallel with associated publications, after peer-review. In the consortium studied for this project, datasets may also be released independently without an associated publication. Such research datasets are conceptualized as “hypothesis free” resources from which novel knowledge can be extracted indefinitely. Among the findings of this project are that biomedical researchers do not download and re-analyze “hypothesis free” research data from open repositories as a regular practice. Data reuse is a complex, delicate, and often time-consuming process. Metadata and ontology schemas appear to be necessary but not sufficient for data reuse processes. For scientists to test new hypotheses on “old” data, they depend on access to peer-reviewed primary analyses, pre-existing trusted relationships with the data creators, and shared research agendas. Data donors (patients, study participants, etc.), on the other hand, retain little control over how open research data are reused. Findings suggest that, in practice, it is impossible to predict – and consequently to regulate – how datasets might be reused once made openly available. Unintended consequences of reusing this consortium’s open data already are emerging, to the concern of some participants.