Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

FPGA-Based Acceleration: From Cloud to Edge

Abstract

Increasingly, FPGAs have shown promise of high processing power in various fields such as machine learning, computer vision, high frequency trading, and others thanks to their basic nature of configurable system. However, this question remains unanswered: “When will FPGAs become popular and mainstream?” Although hardware description languages are notoriously challenging to design with, current software toolchains play a major role in hindering the adoption of FPGAs. This dissertation investigates performance gains by accelerating time series prediction on different FPGA platforms–from smaller edge computing devices to high end data center cards–and analyzes the challenges and limitations of current FPGA design flow (through both HDL and High Level Synthesis development).

First, we present Edge FA-LAMP, an FPGA-accelerated implementation of the Learned Approximate Matrix Profile algorithm, which predicts the correlation between streaming data sampled in real-time and a representative time series dataset used for training. We expose several technical limitations of Xilinx DPU for convolutional neural network acceleration on FPGAs, while providing a mechanism to overcome them.

Next, we show how Learned Approximate Matrix Profile algorithm can be deployed on data center FPGA cards. We implement two different versions of FA-LAMP for high throughput and low latency applications- and show how to integrate DPU on the Alveo card with an Ethernet module that allows for processing real-time data streams delivered over a network. We discuss different strategies to connect the Ethernet IP to the DPU and present methods to further increase network throughput.

Finally, we show how FPGAs can be used as SmartNICs to ensure consistency in data centers. We implement a replication algorithm based on Paxos that leverages RDMA to replicate user requests in memory and to fail-over the system with negligible latency.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View