A Scalable VLSI Architecture for Real-Time and Energy-Efficient Sparse Approximation in Compressive Sensing Systems
- Author(s): Ren, Fengbo
- Advisor(s): Markovic, Dejan
- et al.
Digital electronic industry today relies on Nyquist sampling theorem, which requires to double the size (sampling rate) of the signal representation on the Fourier basis to avoid information loss. However, most natural signals have very sparse representations on some other orthogonal (non-Fourier) basis. This mismatch implies a large redundancy in Nyquist-sampled data, making compression a necessity prior to storage or transmission. Recent advances in compressive sensing (CS) theory offer us an alternative data acquisition framework, which can greatly impact power-starved applications such as wireless sensors. CS techniques provide a universal approach to sample compressible signals at a rate significantly below the Nyquist rate with limited information loss. Therefore, CS is a promising technology for realizing configurable, cost-effective, miniaturized, and ultra-low-power data acquisition devices for mobile and wearable applications.
However, the digital signal processing of compressively-sampled data involves solving a sparse approximation problem, which requires iterative-searching algorithms that have high computational complexity and require intensive memory access. As a result, existing software solutions are neither energy-efficient nor cost-effective for real-time processing of compressively-sampled data, especially when the processing is to be performed on energy-limited devices. To solve this problem, this dissertation presents a scalable VLSI architecture that can be implemented on field-programmable gate arrays (FPGAs) or system-on-chips (SoCs) to perform dedicated-hardware-driven sparse approximation. A VLSI soft-IP core of the sparse approximation engine is developed in Verilog-HDL, which supports a floating-point data format with 10 design parameters, providing a high dynamic range and the flexibility for application-specific user customizations. Taking advantage of the algorithm-architecture co-design that leverages algorithm reformulations, configurable architectures, and efficient memory mapping schemes, the proposed VLSI architecture features a 100% utilization of the computing resources and is scalable in terms of computation parallelism and memory capability.
The hardware emulation of the soft-IP core on a 28-nm Xilinx Kintex-7 FPGA shows that our design achieves the same level of accuracy as the double-precision C program running on an Intel Core i7-4700MQ mobile processor, while providing 47-147x speed-up for ECG signal reconstruction. Furthermore, a 12-237 KS/s 12.8 mW sparse approximation engine chip is realized in a 40-nm CMOS technology for enabling the mobile data aggregation of compressively sampled biomedical signals in CS-based wireless health monitoring systems. The measurement results show that the sparse approximation engine chip operating at the minimum energy point achieves a real-time throughput for reconstructing 61-237 channels of biomedical signals simultaneously with <1% of a mobile device's 2W power budget, which is 14,100x more energy-efficient than the software solver running on the CPU. For high-sparsity signal reconstruction, the sparse approximation engine chip is 76-350x more energy-efficient than prior hardware designs. With a <1% power budget of a mobile device, the 5.13 mm2 sparse approximation engine chip integrated in 40-nm CMOS can enable a 2-3x energy saving at CS-based sensor nodes while providing timely feedback and bringing signal intelligence closer to the user, presenting a significant advantage for 24/7 wireless health monitoring.