Multi-core processors have emerged as the leading solution to the
power and scalability concerns that processor designers currently face. This
transition addresses microarchitectural scalability issues, but it only delays
the onset of the power scalability problem. Due to limitations on threshold
voltage scaling, in a few process generations, processors will only be able to
make use of a small fraction of a silicon die at full frequency at once. This
“utilization wall” will prevent massively multi-core processors from
effectively employing more than a small subset of cores at once. If we cannot
utilize the full array of homogeneous cores, then the utility of building them
comes into question. This paper explores massively heterogeneous CMPs, an
approach to processor design that can continue to scale performance in spite of
the utilization wall. Such designs will comprise 10s to 100s to even 1000s of
heterogeneous specialized processing elements (SPEs), ranging from small ASIC
circuits to large speculative out-of-order general purpose processors.
Massively heterogeneous CMPs combine these SPEs with an execution model that
allows each part of a program to run on the SPE that can execute it most
efficiently. Although the utilization wall dictates that massively
heterogeneous CMPs (like all future processors) may use only a small fraction
of the die at once, it uses that fraction very efficiently. This paper
explores the architectural challenges that arise in designing general-purpose
massively heterogeneous CMPs. Our results demonstrate that massively
heterogeneous systems can extend performance scaling by realizing large gains
(up to 7×) in performance and efficiency relative to more modestly
heterogeneous and homogeneous designs. The paper also presents an ASIC-based
SPE case study that demonstrates the ability of such systems to provide large
efficiency gains even for irregular integer applications.
Pre-2018 CSE ID: CS2009-0947