The once exponential general purpose processors’ (e.g. CPUs) growth of speedup driven bytransistor scaling is fading, which urges both industry and academia to find more energy-
efficient and performant architecture organization. Therefore, research on accelerators specialized for applications of interest emerges because of their promising speedup and energy
saving while retaining flexibility. To design and implement specialized accelerators, intensive
human effort is required to study the target applications and determine tradeoffs between
performance and cost. In addition, these newly proposed hardware often implies lagging
compilation techniques, which hinders the programming productivity. All these facts significantly limits the programmable accelerator adoption.
Moreover, all the prior development effort can hardly be reused in other applicable do-
mains, because the current software/hardware co-designed innovations seldom consider modularity for future integration. Therefore, research projects presented in this dissertation aim
at significantly reforming the full-stack reconfigurable accelerator design paradigm: Ideally,
each software/hardware co-design feature can be comprised in a universal design space for further accelerator composition so that people no longer build accelerators from scratch.
Further, an accelerator can be automatically generated based on the given applications of
interest written in a unified high-level programming interface.
To achieve this goal, this dissertation develops the framework, DSAGEN, including an
accelerator design space with rich software/hardware co-design features, a compiler targets
to accelerators with arbitrary design points within this space, and a design automation
algorithm that efficiently searches this space. According to our evaluation, the compiler can
robustly target multiple application suites on hardware with arbitrary feature combinations.
The framework-generated accelerators can have comparable perf/mm2 compared with prior
handcrafted domain-specific accelerators.
In addition, to demonstrate the wide applicability of our approach, the insights and
principles learned along with this goal are also applied to applicable research questions: By
deploying the DSAGEN-generated accelerator as a reconfigurable overlay on FPGA, it saves
orders-of-magnitude time on compilation and reconfiguration compared with conventional
high-level synthesis, while retaining flexibility. This approach suggests that a deeply spe-
cialized programmable overlay accelerator can potentially supplement the existing FPGA’s
high-level programming paradigm. Also, the compilation techniques for spatial architectures
developed in DSAGEN can be applied to compiling an emerging instruction paradigm specialized for tensor operations — a productive and extensible compilation framework, UNIT,
is presented for these instructions. The extensibility of this framework allows developers to
easily integrate new instructions by describing the instruction semantics. High-performance
code, that outperforms vendor provided libraries up to 2.2�, for end-to-end inferences can
be generated by tensorized rewriting, accompanied with our automated tuning strategies.