Optimization of Heterogeneous NoC for Fused CPU-GPU Architecture
Heterogeneous computing architectures that utilize both CPU and GPU have been the trend nowadays. Several products from AMD, Intel, and NVIDIA have emerged that fused CPU and GPU on the same chip. In such architectures, different processing elements (PEs), including many CPU cores, GPU cores, memory controllers (MCs), and caches, are connected through a common interconnection. CPU and GPU exhibit different network behaviors; CPU tends to be latency-sensitive and GPU, with its high thread level parallelism (TLP), tends to be throughput hungry. Using homogeneous interconnect for such heterogeneous processors can result in performance degradation and power increase. This dissertation focused on designing a heterogeneous mesh-style network-on-chip (NoC) to connect heterogeneous CPU-GPU processors while considering their diametric network demands.
There are many aspects to consider when designing a 2D mesh NoC. Firstly, the placement of the PEs within the mesh. Secondly, setting the NoC parameters: the size of the router's buffer, the number of virtual channels, and the bandwidth of the links. This dissertation tackled all these problems simultaneously. Moreover, to design a heterogeneous NoC, heterogeneity was explored at the router's port and link level, where each port of each router can have different buffer size and number of virtual channels, and each link can have different bandwidth. This explodes the design space and makes exploring all possible design combinations using simulation very difficult.
In this dissertation, heuristic-based optimization methods were proposed to obtain a near-optimal heterogeneous NoC design. Firstly, a method based on Genetic Algorithm (GA) to get a design with optimal performance in terms of the average network latency. An analytical model based on queueing theory that supports virtual channels was proposed to get a performance measure of the design. Secondly, a multi-objective method based on the Strength Pareto Evolutionary Algorithm 2 (SPEA2) to get an optimal design in terms of the performance and the power of NoC. Also, an activity based power model was proposed to get the power of the design. The optimal designs were validated using a full-system simulator.