Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Customized Computing and Machine Learning

Abstract

Nowadays, abundant data across various domains necessitate high-performance computing capabilities. While we used to be able to answer this need by scaling the frequency, the breakdown of Dennard's scaling has rendered this approach obsolete. On the other hand, Domain-specific Accelerators (DSAs) have gained a growing interest since they can offer high performance while being energy efficient. This stems from several factors, such as,1) they support utilizing special data types and operations, 2) they offer massive parallelism, 3) one can customize the memory access, 4) customizing the control/data path helps with amortizing the overhead of fixed instructions, and 5) one has the option of co-designing the algorithm with the hardware.

Unfortunately, despite the huge speedups that DSAs can deliver compared to general-purpose processors, their programmability has not caught up. In the past few decades, High-Level Synthesis (HLS) tools were introduced to raise the abstraction level and free designers from delving into architecture details at the circuit level. While HLS can significantly reduce the efforts involved in the hardware architecture design, not every HLS code yields optimal performance, requiring designers to articulate the most suitable microarchitecture for the target application. This can affect the design turnaround times as there are more choices to explore at a higher level. Moreover, this limitation has confined the DSA community primarily to hardware designers, impeding widespread adoption. This dissertation endeavors to alleviate this problem by combining customized computing and machine learning. Consequently, this dissertation consists of two core parts: 1) customized computing tailored for machine learning applications, and 2) machine learning employed to automate the optimization process of customized computing. Our focus will be on FPGAs as their cost-effective nature and rapid prototyping capabilities make them especially suitable for our research.

The large amounts of data available in data centers have motivated researchers to develop machine learning algorithms for processing them. Given that a significant portion of data stored in these centers exists in the form of images or graphs, our attention is directed towards two prominent algorithms designed for such tasks: Convolutional Neural Network (CNN) and Graph Convolutional Network (GCN). In the first part of the dissertation, we develop architecture templates for accelerating these applications. This approach facilitates a reduction in the development cycle, allowing the instantiation of module templates with customizable parameters based on the specific target application.

In the second part of the dissertation, we move our focus to general applications and work on automating their optimization steps including design space exploration and performance/area modeling. Therefore, we structure our problem in a way that can be fed into the learning algorithms. We develop a highly efficient bottleneck optimizer to explore the search space. We also explore different learning algorithms including multi-layer perceptron, graph neural networks, attention networks, jumping knowledge networks, etc., aiming to create a performance predictor that is both highly accurate and robust. Our studies show that we can optimize the microarchitecture of general applications quickly using our automated tools. This can open new doors to those without hardware knowledge to try customized computing which in turn helps to broaden the FPGA community and further improve its technology.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View