Skip to main content
Open Access Publications from the University of California

College of Engineering

Computer Science bannerUC Davis

Applying Machine Learning to Identify NUMA End-System Bottlenecks for Network I/O

  • Author(s): Chauhan, Harshvardhan Singh
  • et al.
The data associated with this publication are available upon request.

Performance bottlenecks across distributed nodes, such as in high performance computing grids or cloud computing, have raised concerns about the use of Non-Uniform Memory Access (NUMA) processors and high-speed commodity interconnects. Performance engineering studies have investigated this with varying degrees of success. However, with continuous evolution in end-system hardware, along with changes in the Linux networking stack, this study has become increasingly complex and difficult due to the many tightly-coupled performance tuning parameters involved. In response to this, we present the Networked End- System Characterization and Adaptive Tuning tool, or NESCAT, a partially automated performance engineering tool that uses machine learning to study high-speed network connectivity within end-systems. NESCAT exploits several novel techniques for systems performance engineering. These include using k-means clustering and Artificial Neural Networks (ANNs) to effectively learn and predict network throughput performance and resource utilization for end-system networks.NESCAT is a unique tool, different from other previously designed applications. It is based on of machine learning and clustering techniques on NUMA core-binding cases. This work focuses on predicting optimal Network Interface Controller (NIC) parameters for performance predictability, which is a necessity for complex science applications. Through experiments, we are able to demonstrate the uniqueness of this technique by achieving high accuracy rates in predicted and actual performance metrics such as throughput, data rate efficiency, and frame rates. Our system is able to ingest large amounts of data to produce results within 2 hours for a machine with an 8-core end-systems. The root mean square error of the designed model is around 10^-1 and thus predicts output efficiently when compared to live run data on an actual machine.

Main Content