Skip to main content
eScholarship
Open Access Publications from the University of California

ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage

Published Web Location

https://sdm.lbl.gov/pdc/pubs/201812-BigData-ARCHIE.pdf
No data is associated with this publication.
Abstract

Scientific data analysis typically involves reading massive amounts of data that was generated by simulations, experiments, and observations. Performance of reading such large volumes of data from disk-based file systems is often poor because of the slow and mechanical components in the disks. Recent supercomputing systems are adding non-volatile storage layers in a hierarchy to handle the performance gap between fast main memory and slow disk-based storage. Software libraries for managing this hierarchy not only need efficient reading of data but also reduce user-involvement for cross-layer data movement. Furthermore, these libraries need to support array data access patterns into hierarchical storage management as scientific data is often organized in array-based data structures. Existing software typically manage individual storage layers requiring significant manual process in moving data among them. In this paper, we introduce a new array caching in hierarchical storage (ARCHIE) to accelerate array data analysis in a seamless fashion. ARCHIE evaluates array access patterns and prefetches data with array semantics between storage layers. Our evaluation shows that ARCHIE outperforms state-of-the-art file systems, i.e., Lustre and DataWarp, on a production supercomputing system by up to 5.8× in accessing data by scientific analysis applications.

Item not freely available? Link broken?
Report a problem accessing this item