Skip to main content
Open Access Publications from the University of California

ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage

  • Author(s): Dong, B
  • Wang, T
  • Tang, H
  • Koziol, Q
  • Wu, K
  • Byna, S
  • Editor(s): Abe, Naoki
  • Liu, Huan
  • Pu, Calton
  • Hu, Xiaohua
  • Ahmed, Nesreen
  • Qiao, Mu
  • Song, Yang
  • Kossmann, Donald
  • Liu, Bing
  • Lee, Kisung
  • Tang, Jiliang
  • He, Jingrui
  • Saltz, Jeffrey S
  • et al.

Published Web Location
No data is associated with this publication.

© 2018 IEEE. Scientific data analysis typically involves reading massive amounts of data that was generated by simulations, experiments, and observations. Performance of reading such large volumes of data from disk-based file systems is often poor because of the slow and mechanical components in the disks. Recent supercomputing systems are adding non-volatile storage layers in a hierarchy to handle the performance gap between fast main memory and slow disk-based storage. Software libraries for managing this hierarchy not only need efficient reading of data but also reduce user-involvement for cross-layer data movement. Furthermore, these libraries need to support array data access patterns into hierarchical storage management as scientific data is often organized in array-based data structures. Existing software typically manage individual storage layers requiring significant manual process in moving data among them. In this paper, we introduce a new array caching in hierarchical storage (ARCHIE) to accelerate array data analysis in a seamless fashion. ARCHIE evaluates array access patterns and prefetches data with array semantics between storage layers. Our evaluation shows that ARCHIE outperforms state-of-the-art file systems, i.e., Lustre and DataWarp, on a production supercomputing system by up to 5.8× in accessing data by scientific analysis applications.

Item not freely available? Link broken?
Report a problem accessing this item