Background
The testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashion would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists.Results
An implementation of an automated workflow has been developed for the integrated analysis of data from nuclear magnetic resonance (NMR) experiments and electronic structure calculations. Kepler (Altintas et al. 2004) open source software was used to coordinate the processing and transfer of data at each step of the workflow. This workflow incorporated several open source software components, including electronic structure code to compute NMR parameters, a program to simulate NMR signals, NMR data processing programs, and others. The Kepler software was found to be sufficiently flexible to address several minor implementation challenges without recourse to other software solutions. The automated workflow was demonstrated with data from a [Formula: see text] NMR study of uranyl salts described previously (Cho et al. in J Chem Phys 132:084501, 2010).Conclusions
The functional implementation of an automated process linking NMR data with electronic structure predictions demonstrates that modern software tools such as Kepler can be used to construct programs that comprehensively manage complex, multi-step scientific workflows spanning several different computers. Automation of the workflow can greatly accelerate the pace of discovery, and allows researchers to focus on the fundamental scientific questions rather than mastery of specialized software and data processing techniques. Future developments that would expand the scope and power of this approach include tools to standardize data and associated metadata formats, and the creation of interactive user interfaces to allow real-time exploration of the effects of program inputs on calculated outputs.