Skip to main content
eScholarship
Open Access Publications from the University of California

Exploration of cache behavior using HPSS per-file transfer logs

Abstract

We assembled 18 months of transfer logs from a production High Performance Storage System (HPSS) system at the National Energy Research Scientific Computing Center(NERSC) and analyzed them to assess workload behavior and gain some insight into which cache configurations would provide the best service to the users. We found, as expected, that the workload is distributed over file size with a declining number of files as the files get larger, so the amount of space consumed per file size increment is roughly constant up to file sizes of 1 GB. Sixty one percent of file accesses were write accesses. There are a significant number of files written which are never read -- backup files and similar files. For all sizes of files, access frequencies decline with the age of the files. HPSS uses the cache as an I/O buffer for incoming data. At our installation the cache behavior is dominated by the write traffic. Cache lifetimes tend to scale linearly with the size of the cache and inversely with the amount of data flow.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View