Lawrence Berkeley National Laboratory
GPFS HPSS Integration: Implementation Experience
- Author(s): Hazen, Damian
- et al.
In 2005 NERSC and IBM Global Services Federal began work to develop an integrated HSM solution using the GPFS file system and the HPSS hierarchical storage system. It was foreseen that this solution would play a key role in data management at NERSC, and fill a market niche for IBM. As with many large and complex software projects, there were a number of unforeseen difficulties encountered during implementation. As the effort progressed, it became apparent that DMAPI alone could not be used to tie two distributed, high performance systems together without serious impact on performance. This document discusses the evolution of the development effort, from one which attempted to synchronize the GPFS and HPSS name spaces relying solely on GPFS?s implementation of the DMAPI specification, to one with a more traditional HSM functionality that had no synchronized namespace in HPSS, and finally to an effort, still underway, which will provide traditional HSM functionality, but requires features from the GPFS Information Lifecycle Management (ILM) to fully achieve this goal in a way which is scalable and meets the needs of sites with aggressive performance requirements. The last approach makes concessions to portability by using file system features such as ILM and snapshotting in order to achieve a scalable design.