Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Accelerating Synchronous Many-Core Networks on FPGAs

Abstract

Applications running on custom architectures with hundreds of specialized processing elements (PEs) on field-programmable gate arrays (FPGAs) can gain 10x or greater speedups versus desktop or embedded processors. Applications benefiting from such many-core PE networks include applications with non-centralized communication and memory requirements, like real-time emulation of physical systems, video streaming applications, signal processing, and more. Synchronous many-core networks (syncMCs) consist of synchronized many-core PE networks that execute distributed computations in parallel, keeping each PE in the network in lockstep through a regular, periodic communication phase.

An FPGA has limited logic and routing resources on which to place a synchMC network. Large or densely connected synchMCs can overwhelm the available resources and severely impact the resulting FPGA circuit frequency, and thus application performance. This dissertation focuses primarily on methods for successfully implementing large synchMCs, consisting of hundreds of PEs and thousands of interconnects, on an FPGA. An approach is described that considers the natural structure of a physical model, such as a mesh, ring, or cube, to perform a graph embedding of a synchMC on a 2-dimensional grid of physical PE regions. The described PE placement techniques reduce critical path length and allows previously unroutable designs able to complete place-and-route. In addition, an automated approach to reduce the number of wires in a synchMC allows applications of arbitrary size and complexity to fit within the resource constraints of a target FPGA. Time-multiplexed communication for synchMCs is introduced, and a greedy scheduler, a heuristic scheduler, and an integer linear program scheduler are described. Finally, an approach for exploring the configuration space of a synchMC executing real-time emulation of a physical model is described. Using the described synthesis approaches, synchMCs enable the fastest emulation of physical models on a moderately priced platform, executing 15x faster than a desktop PC, 26x faster than a GPU, 9x faster than a network-on-chip (NoC), and 9x faster than a circuit produced via high-level synthesis (HLS).

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View