Search

Scholarly Works (1 results)

Multimedia

SpVM Acceleration with Latency Masking Threads on FPGAs

UC Riverside Previously Published Works (2014)

Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a ready thread while the suspended threads wait for the return of the requested data value from memory. The multi-threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks.

1 supplemental PDF