DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File Systems
Published Web Location
https://sdm.lbl.gov/oapapers/ccgrid19-kim.pdfAbstract
In high-performance computing, storage is a shared resource and used by all users with many different application requirements and knowledge of storage. Consequently, the optimal storage configuration varies according to the I/O behavior of each application. While system logs are helpful resources in understanding the storage behavior, it is non-trivial for each user to analyze the logs and adjust complex configurations. Even for experienced users, it is difficult to understand the full stack of I/O systems and find the optimal configuration for the specific application. In this work, we analyzed the I/O activities of CORI which is an HPC system in National Energy Research Scientific Computing Center (NERSC). The result of our analysis shows that most users do not adjust storage configurations and use the default settings. Also, it shows that only a few applications are executed repeatedly in the HPC environment. Based on this result, we have developed DCA-IO, a dynamic distributed file system configuration adjustment algorithm, which utilizes system log information and widely adapted rules to adjust storage configurations automatically without any user intervention. DCA-IO utilizes existing system logs and does not require any modifications in code or an additional library. To demonstrate the effectiveness of DCA-IO, we have performed experiments using I/O kernels of the real applications in both isolated small-sized Lustre environment and CORI. Our experimental result shows that the use of our scheme can lead to improvements in the performance of HPC applications by up to 75% in an isolated environment and 50% in a real HPC environment without user intervention.