Synthesis of Custom Networks of Processing Elements Onto Field-Programmable Gate Arrays for Physical System Emulation
Executing a complex physical system model in real-time or faster has numerous applications in cyber-physical system. For instance, if a human lung model can be executed in real-time, the lung model can be used to test a ventilator in real-time. A complex physical system can often be captured with thousands of ordinary differential equations (ODEs). We introduce an approach to map the ODEs of a physical system to a custom network of processing elements on a field-programmable gate array (FPGA). A processing element (PE) is a light-weight processor that solves a subset of the ODEs. The custom interconnection of the processing elements is based on the data dependencies of the ODEs. The processing elements can execute in parallel and communicate with each other. To automate the design process, we developed a compilation tool to find a good mapping between the ODEs and the processing elements, and to generate synthesizable HDL (hardware description language) code for the entire design.
We first investigated a general purpose processing element that can solve any type of ODEs. The network of general PEs achieves 10-20x speedups against a single-threaded Intel I7-950 CPU, and 4x speedups against an Nvidia GTX 460 GPU. The network of general PEs also yields 2x speedups compared to a commercial high-level-synthesis tool. We further optimized our approach by building custom processing elements that can only solve certain type of ODEs. For homogeneous physical systems (contains only one or a few types ODEs), the network of custom PEs yields another 6x speedup compared to the network of general PEs, given comparable size. Finally, we introduced the network of heterogeneous PEs, where the network may contain both general PEs and different types of custom PEs. We developed an allocation and binding heuristic to explore the large design space. The network of heterogeneous PEs achieves 7x/6x speedup against the network of general PE/single-type custom PEs, and was on average 10x faster than the circuits generated by a high-level synthesis tool.