Developing, Synthesizing, and Automating Domain-Specific Accelerator
Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Developing, Synthesizing, and Automating Domain-Specific Accelerator

Abstract

The once exponential general purpose processors’ (e.g. CPUs) growth of speedup driven bytransistor scaling is fading, which urges both industry and academia to find more energy- efficient and performant architecture organization. Therefore, research on accelerators specialized for applications of interest emerges because of their promising speedup and energy saving while retaining flexibility. To design and implement specialized accelerators, intensive human effort is required to study the target applications and determine tradeoffs between performance and cost. In addition, these newly proposed hardware often implies lagging compilation techniques, which hinders the programming productivity. All these facts significantly limits the programmable accelerator adoption. Moreover, all the prior development effort can hardly be reused in other applicable do- mains, because the current software/hardware co-designed innovations seldom consider modularity for future integration. Therefore, research projects presented in this dissertation aim at significantly reforming the full-stack reconfigurable accelerator design paradigm: Ideally, each software/hardware co-design feature can be comprised in a universal design space for further accelerator composition so that people no longer build accelerators from scratch. Further, an accelerator can be automatically generated based on the given applications of interest written in a unified high-level programming interface. To achieve this goal, this dissertation develops the framework, DSAGEN, including an accelerator design space with rich software/hardware co-design features, a compiler targets to accelerators with arbitrary design points within this space, and a design automation algorithm that efficiently searches this space. According to our evaluation, the compiler can robustly target multiple application suites on hardware with arbitrary feature combinations. The framework-generated accelerators can have comparable perf/mm2 compared with prior handcrafted domain-specific accelerators. In addition, to demonstrate the wide applicability of our approach, the insights and principles learned along with this goal are also applied to applicable research questions: By deploying the DSAGEN-generated accelerator as a reconfigurable overlay on FPGA, it saves orders-of-magnitude time on compilation and reconfiguration compared with conventional high-level synthesis, while retaining flexibility. This approach suggests that a deeply spe- cialized programmable overlay accelerator can potentially supplement the existing FPGA’s high-level programming paradigm. Also, the compilation techniques for spatial architectures developed in DSAGEN can be applied to compiling an emerging instruction paradigm specialized for tensor operations — a productive and extensible compilation framework, UNIT, is presented for these instructions. The extensibility of this framework allows developers to easily integrate new instructions by describing the instruction semantics. High-performance code, that outperforms vendor provided libraries up to 2.2�, for end-to-end inferences can be generated by tensorized rewriting, accompanied with our automated tuning strategies.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View