For decades computer architects have taken advantage of Moore's law to get bigger, faster, and more energy-efficient chips "for free," reaping the benefits of silicon process improvements and shrinking technology nodes. Each new technology node brought exponentially more transistors, balanced by exponentially lower transistor switching power, allowing the power budget for a fixed silicon area to remain relatively constant. Architects could count on more transistors---and use them to build more complex designs---without substantially increasing the total power budget for a chip.
Today, however, rising CMOS leakage currents have limited further reductions in supply voltage, leading to a power-limited utilization wall and an end to classical Dennard scaling. This breakdown results in a new regime of dark silicon, in which vast swaths of silicon area must remain "dark" (powered down or under-clocked) most of the time. Architects must turn to novel approaches to squeeze ever more performance out of every last square-millimeter of silicon.
This dissertation demonstrates that one viable approach to the dark silicon problem is specialization. Rather than relying solely on bigger, faster, general-purpose processors, chip architects have been increasingly augmenting their systems with special-purpose accelerators. These accelerators can speed up a given computation, allow it to run with less energy, or both. Using less energy frees up power and thermal budgets, allowing more computations to run in parallel and extending the computational capabilities we've come to demand from silicon.
This dissertation presents two such specialized architectures. The first is GreenDroid, a mobile application processor built with custom accelerators targeting Android. The accelerators are energy-saving specialized circuits called conservation cores, or c-cores. In a 45-nm process, just 7-square-millimeters of silicon dedicated to c-cores covers approximately 95% of our Android workload. Powered by c-cores, GreenDroid uses 11x less energy on average than a general-purpose CPU.
The second is Pixel Visual Core, a commercial accelerator from Google that enables energy-efficient computational photography and machine learning in the Pixel 2 and Pixel 3 smartphones. Pixel Visual Core is powered by an 8-core Image Processing Unit with 4,096 16-bit ALUs capable of performing 3.1 Tera-operations/second in under 5 watts. Compared to a 10-nm general-purpose application processor, the 28-nm Pixel Visual Core runs key compute kernels 3-6x faster and with 7-16x less energy.