Interconnection networks synthesis and optimization
- Author(s): Zhu, Yi
- et al.
The advent of new technologies brings revolutions in the fields of VLSI design and high performance computing. On one hand, the increasing number of processing elements, both in on-chip multi-core systems and supercomputer systems, demands high bandwidth communications. On the other hand, the performance of the system, usually measured by the latency and power consumption, is gradually being dominated by the interconnection networks. These facts raise challenges in synthesizing and optimizing interconnection networks. In this dissertation, we study methodologies and algorithms to perform the interconnection network synthesis and optimization in both on-chip networks and supercomputer systems. We explore a wide range of network topologies and physical implementations, and evaluate the performance of multi- commodity flow (MCF) algorithms. We design efficient approximation schemes to solve different variations of MCF problems, which incorporate different practical constraints. The automated design flows discover much larger design space than the traditional methods and therefore achieve promising results. In the study of Network-on-Chip (NoC), we are optimizing the communication latency and power consumption, which are two competing design objectives. With an improved fully polynomial approximation algorithm, power optimal design of a structured 8x8 NoC can be found for given average latency constraints with certain communication bandwidth requirements. Our methodology explores a large number of topologies, introduces a variety of wire styles into NoC design, and incorporates latency constraints and power minimization objectives into a unified MCF model, with simultaneous optimization on network topologies, physical embedding, and interconnect wire styles. The results demonstrate the strengths of the optimized networks and indicate the clear trend of power and latency tradeoffs. In the synthesis and optimization of networks in supercomputer systems, we use the packaging framework of the Blue Gene/L supercomputer as an example to demonstrate the advantages of our design flow, which has incorporated real design issues, such as board dimensions and pin numbers. Using real benchmark traces, the experiments show that the best topologies identified by our algorithm can achieve better average latency compared to the existing 3- dimensional torus networks