Performance Analysis and Optimization for Scientific Data Workloads
Skip to main content
eScholarship
Open Access Publications from the University of California

Performance Analysis and Optimization for Scientific Data Workloads

Abstract

Scientific data generated at experimental and observational facilities are increasingly being processed on large-scale compute systems. Most of the experimental data analysis workflows are not designed or implemented to run on large scale environments and take full advantage of HPC compute and storage resources. These applications are unlike the traditional tightly-coupled scientific applications and hence face significant performance and scalability challenges as the volume of data increases exponentially. In this paper, we conduct a performance and scalability analysis for experimental analysis applications and workflows operating on data from light sources. Our analysis detects and quantifies I/O performance, scalability and runtime bottlenecks for three data analysis applications that run on NERSC resources. Based on our analysis we propose and implement a set of optimizations that lead to reducing the amount of time spent on I/O operations by almost 90%

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View