The Distribution of Data Management Responsibility within Scientific Research Groups
- Author(s): Wallis, Jillian C.
- Advisor(s): Borgman, Christine L
- et al.
Scientific data often are expensive to produce or impossible to reproduce. Those data may be of great future value for reuse, recombination, and replication by other researchers. However, the potential value of these data can only be achieved if the data producers manage them properly. Visions of data management and the role of the data producer have been constructed by data curators and funders from the top-down, but we have little understanding of what data management looks like on the ground. What do data producers see as their data management responsibilities? The exploratory research reported in this dissertation provides a rich description of data management tasks performed by members of six research groups and members' perception of data management responsibilities. Groups were selected from the Center for Embedded Networked Sensing (CENS), an NSF-funded Science and Technology Research Center, where researchers are already experiencing the data deluge. Document analysis, semi-structured interviews, and field observations were coded and analyzed for emergent themes and used to construct models of data management practices. Significant findings include: (i) these six research groups acquired a diverse array of data (ii) a generalized data life cycle can be applied to practices of these groups, (iii) researchers actively managed their data throughout the data life cycle to support their own use, and (iv) data management tasks were distributed between the members of a research group, and are tied to data handling tasks such as collection, processing, and analysis. The data management tasks performed by researchers are categorized into four core functions: selection for quality, verification for validity, storage for accessibility, and documentation for interpretability. A set of roles and responsibilities were identified for the data producers collaborating on each research project. These findings suggest that including author contribution statements in publications would assist future users of those data in determining who to contact for questions about their creation and context. This study reveals how, when, and why science and technology researchers manage their data and makes recommendations for data management within research groups that will make data more usable and sharable.