Skip to main content
eScholarship
Open Access Publications from the University of California

Efficient clustered server-side data analysis workflows using SWAMP

Abstract

Technology continues to enable scientists to set new records in data collection and production, intensifying a need for large scale tools to efficiently process and analyze the growing mountain of data. To complement growth in the number of data centers and the volume of data they store, we introduce our Script Workflow Analysis for MultiProcessing (SWAMP) system. Our system provides safe server-side processing capabilities that allow scientists to reuse familiar desktop-based analysis methods represented in shell-scripts. Built-in script compilation isolates file accesses and generates workflows, while a cluster-capable execution engine partitions and executes the resulting workflow. Benchmarks illustrate up to 20X performance gains, as well as the importance of I/O considerations which make other computation systems less effective at geoscience data reduction.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View