Skip to main content
eScholarship
Open Access Publications from the University of California

Hunting data rogues at scale: Data quality control for observational data in research infrastructures

  • Author(s): Pastorello, G
  • Gunter, D
  • Chu, H
  • Christianson, D
  • Trotta, C
  • Canfora, E
  • Faybishenko, B
  • Cheah, YW
  • Beekwilder, N
  • Chan, S
  • Dengel, S
  • Keenan, T
  • O'brien, F
  • Elbashandy, A
  • Poindexter, C
  • Humphrey, M
  • Papale, D
  • Agarwal, D
  • et al.
Abstract

© 2017 IEEE. Data quality control is one of the most time consuming activities within Research Infrastructures (RIs), especially when involving observational data and multiple data providers. In this work we report on our ongoing development of data rogues, a scalable approach to manage data quality issues for observational data within RIs. The motivation for this work started with the creation of the FLUXNET2015 dataset, which includes carbon, water, and energy fluxes plus micrometeorological and ancillary data measured in over 200 sites around the world. To create an uniform dataset, including derived data products, extensive work on data quality control was needed. The unpredictable nature of observational data quality issues makes the automation of data quality control inherently difficult. Developed based on this experience, the data rogues methodology allows for increased automation of quality control activities by systematically identifying, cataloging, and documenting implementations of solutions to data issues. We believe this methodology can be extended and applied to others domains and types of data, making the automation of data quality control a more tractable problem.

Many UC-authored scholarly publications are freely available on this site because of the UC Academic Senate's Open Access Policy. Let us know how this access is important for you.

Main Content
Current View