Skip to main content
eScholarship
Open Access Publications from the University of California

Object-Centric Data Management in HPC Workflows - A Case Study

Abstract

HPC workflows consist of multiple phases and components executed collaboratively to reach the same goal. They perform necessary computations and exchange data, of-ten through system-wide POSIX-compliant parallel file systems. However, POSIX file systems pose challenges in performance and scalability, prompting the development of alternative storage systems like object stores. Despite their potential, object stores face adoption barriers in HPC workflows due to their lack of workflow awareness and the structured nature of HPC data. This work presents a case study using the Proactive Data Containers (PDC), a framework focusing on object-centric runtime data management, to support a real-world astronomy workflow that runs on HPC systems, called Montage. Due to its user-space deployment feature, PDC is flexible to be adopted transparently with existing I/O libraries. This study explores the use of PDC with Montage's existing FITS-based I/O methods and discusses workflow-oriented optimizations such as caching, prefetching, and write aggregation, and provides insights and lessons learned throughout the porting process.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View