Reducing the development cost of customized hardware acceleration for cloud infrastructure
- Author(s): Khazraee, Moein
- Advisor(s): Schulman, Aaron
- et al.
Customized hardware accelerators have made it possible to meet increasing workload demands in cloud computing by customizing the hardware to a specific application. They are needed because the cost and energy efficiency of general-purpose processors has plateaued. However, creating a custom hardware accelerator for an application takes several months for development and requires upfront development costs in the order of millions of dollars. These constraints have limited their use to applications that have sufficient maturity and scale to justify a large upfront investment. For instance, Google uses customized hardware accelerators to process voice searches for half a billion Google Assistant customers, and Microsoft uses programmable customized hardware accelerators to answer queries for ~100 million Bing search users. Reducing development costs makes it possible to use hardware accelerators on applications that have moderate scale or change over time.
In this dissertation, I demonstrate that it is feasible to reduce the development costs of custom hardware accelerators in cloud infrastructure. Specifically, the following three frameworks reduce development cost for the three main parts of the cloud infrastructure. For computation inside data centers, I built a bottom-up framework that considers different design parameters of fully customized chips and servers to find the optimal total cost solution. This solution balances operational, fixed and development costs. Counter-intuitively, I demonstrate that older silicon technology nodes can provide better cost efficiency for moderate applications. For in-network computations, I built a framework that reduces development cost by
offloading the control portion of an application-specific hardware accelerator to modest processors inside programmable customized hardware. I demonstrate that this framework can achieve throughput of ~200 Gbps for the compute-intensive task of deep packet inspection. For base stations at the cloud edge, I built a flexible framework on top of software-defined radios which significantly reduces their required computation performance and bandwidth. I show that it is possible to backhaul the entire 100 MHz of the 2.4 GHz ISM band over only 224 Mbps instead of 3.2 Gbps; making it possible to decode BLE packets in software with requirement of a wimpy embedded processor.