Skip to main content
Open Access Publications from the University of California

Too big to share? Scaling up knowledge transfer workflows from little science to big science

  • Author(s): Randles, Bernadette M.
  • Sands, Ashley E.
  • Borgman, Christine L.
  • et al.
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License

As scientific data grow beyond available bandwidth capacity, data become difficult to transport for processing or sharing. Scientists who require hundreds of terabytes of data for large simulations, such as those in cosmology and turbulence, need large storage spaces and quick processing times to do their science. Cloud storage and high performance computing services enable these scientific communities to conduct research, but may constrain access to results. Datasets become scattered across locations, often described by competing metadata schema, which limits their discoverability and retrieval by other scientists. We report preliminary findings from a case study of an infrastructure being designed for use by multiple scientific disciplines. The infrastructure is intended to store original datasets, code used to conduct analysis, and resulting datasets in a common area available via web browser. Researchers will be able to share these components of their workflows by granting access to a virtual notebook. Although the notebook is not necessarily a permanent record of the research, it can be exported in many formats, referenced, and can support multiple simulations with multiple runs of parameters, all within a browser. The Jupyter (formerly iPython) notebooks will be configured and load tested to scale with the system in a special environment that will be maintained on the server. We are studying how these innovative tools and infrastructures are applied and adopted across disciplines. These new workflows to address the data size problem and to consolidate a researcher’s work into one virtual area may enable new forms of scientific collaboration. The benefits, costs, and tradeoffs of these workflows and tools will inform scientific practice and policy.

Presented at FORCE 2016 Conference

Main Content
Current View