Custom, application-specific implementations of digital signal processing (DSP) systems offer high performance and high energy efficiency, but require significant design and verification effort. Fast Fourier transform (FFT) processors with a broad range of performance requirements are needed for many modern-day signal processing applications, ranging from medical imaging and machine learning to communication and radio astronomy. Certain applications, including modern-day wireless communications, require runtime reconfigurability across a multitude of mixed-radix FFT sizes, high throughput, low latency, and lower power. Convolutional neural networks use multi-dimensional FFTs with stringent requirements on quantization bounds. Despite sharing underlying algorithms and hardware constructs, FFT designs are often difficult to reuse on a per-application or even per-platform basis, leading to redeveloping and reverifying conceptually similar instances. Hardware generators are attractive solutions for effectively balancing fine-grained control of implementation details with simple, rapidly retargetable hardware descriptions, but the existing FFT generators do not support key features like runtime reconfigurability or more general mixed-radix FFTs, limiting their applicability.
This thesis presents ACED (A Chisel Environment for DSP), a library extension to the Chisel hardware construction language and the FIRRTL (Flexible Intermediate Representation for RTL) compiler specifically created to simplify the development of hardware DSP generators. ACED allows DSP designs to be specified at a higher level of abstraction, making it easier to add new features as they become necessary. Optimization and specialization are handled via platform-specific compiler passes that promote generator reusability. The ACED library has been used to create a parameterizable memory-based, runtime-reconfigurable 2^n 3^m 5^k 7^l FFT generator to support next-generation wireless systems prototyping. The generator uses a conflict-free, in-place, multi-bank SRAM design, and exploits the duality of decimation-in-frequency and decimation-in-time FFTs to support continuous data flow with ~2N memory. The hardware itself is templated so that Chisel/Scala code can be written to add additional functionality, and users pass parameters to the hardware template via a "firmware" block. This is the essence of the Chisel DSP generator methodology.
The FFT generator has been proven via a 0.37-mm^2 LTE/Wi-Fi compatible FFT RISC-V accelerator instance with measured performance and area comparable to state-of-the-art. To demonstrate the use case of the FFT generator in a larger systems context, a low-power signal acquisition front end capable of sensing frequency-sparse signals in a 1.89-GHz bandwidth with a resolution of 175 kHz in real time has also been prototyped in a 16-nm process. The spectral analysis chip relies on mixed-radix FFTs and reconstruction via the fast Fourier aliasing-based sparse transform (FFAST) algorithm to recover signals in compressed form from a subsampled input. This thesis presents new tools and design methodologies to rapidly design DSP hardware generators---with a particular focus on FFTs---for use in emerging applications such as spectrum sensing for cognitive radio, RADAR, and more.