DataUp: A tool to help researchers describe and share tabular data
Scientific datasets have immeasurable value, but they lose their value overtime without proper documentation, long-term storage, and easy discovery andaccess. Across disciplines as diverse as astronomy, demography, archeology,and ecology, large numbers of small heterogeneous datasets (i.e., the long tailof data) are especially at risk unless they are properly documented, saved, andshared. One unifying factor for many of these at-risk datasets is that they residein spreadsheets.In response to this need, the California Digital Library (CDL) partneredwith Microsoft Research Connections and the Gordon and Betty MooreFoundation to create the DataUp data management tool for Microsoft Excel.Many researchers creating these small, heterogeneous datasets use Excel atsome point in their data collection and analysis workflow, so we were interestedin developing a data management tool that fits easily into those work flows andminimizes the learning curve for researchers.The DataUp project began in August 2011. We first formally assessedthe needs of researchers by conducting surveys and interviews of our targetresearch groups: earth, environmental, and ecological scientists. We foundthat, on average, researchers had very poor data management practices, werenot aware of data centers or metadata standards, and did not understand thebenefits of data management or sharing. Based on our survey results, wecomposed a list of desirable components and requirements and solicitedfeedback from the community to prioritize potential features of the DataUp tool.These requirements were then relayed to the software developers, and DataUpwas successfully launched in October 2012.