- Ding, Nan;
- Williams, Samuel;
- Nam, Hai Ah;
- Groves, Taylor;
- Awan, Muaaz Gul;
- Lindsey, LeAnn;
- Daley, Christopher;
- Selvitopi, Oguz;
- Oliker, Leonid;
- Wright, Nicholas
Tightly-coupled HPC systems have rigid memory allocation and can result in expensive memory resource underutilization. As novel memory and network technologies mature, disaggregated memory systems are becoming a promising solution for future HPC systems. It allows workloads to use the available memory of the entire system. In this paper, we propose a design framework to explore the disaggregated memory system design space. The framework incorporates memory capacity, network bandwidth, and local and remote memory access ratio, and provides an intuitive approach to guide machine configurations based on technology trends and workload characteristics. We apply our framework to analyze eleven workloads from five computational scenarios, including AI training, data analysis, genomics, protein, and traditional HPC. We demonstrate the ability of our methodology to understand the potential and pitfalls on a disaggregated memory system and motivate machine configurations. Our methodology shows that the 10 out of our 11 applications/workflows can leverage disaggregated memory without affecting performance.