A Holistic Design Model for Energy-Efficient AI-Enabled Data Centers
- Author(s): Tavakkoli, Fatemeh
- Advisor(s): Tan, Sheldon
- et al.
With increasing power density of latest generation of AI-enabled server racks (e.g. 30-80 kW) along with growing adoption of such servers for deep learning tasks such as image understanding, speech recognition, etc., the energy and cooling requirements of AI-enabled data centers are growing at an alarming rate. In recent years, data center architects have emphasized the importance of holistic approaches to design and operation of data centers by considering both IT equipment and cooling infrastructures; however, a comprehensive design approach for an energy-efficient data center is very limited. This paper details the development of a tool that focuses on energy-efficient design of a data center based on workload predictions, IT configurations, and climatic conditions. Actual measured data are typically not available in the design stage and evaluation of data center energy efficiency, in terms of power usage effectiveness (PUE), solely rely on analytical models. Therefore, in this paper, a thermodynamic-based physical model of an AI-enabled data center configuration, including the cooling system, is developed and benchmarked based on an actual data center. This data center is located at a colocation facility and mainly consist of GPU server nodes with primary application of AI/deep learning research. The colocation center is equipped with a chiller system to create a building chilled water loop, which is used to cool the air flow of air conditioning units (CRAHs) in each data center room. The chilled air from the CRAH units are supplied to an under-floor plenum and eventually enter the data center room through perforated tiles to provide sufficient cooling to the server racks in the cold aisle containment. For benchmarking the data center, power, coolant flow rate, and temperature data are obtained either through the equipment’s monitoring system or via sensor measurements. Comprehensive PUE analysis can be established to determine the most energy-efficient cooling system design and to find the optimal flow rate and temperature set points while meeting design constraints such as geographical location and weather conditions