In this work, we propose the design of a floating-point Proportional-Integral-Derivative(PID) controller accelerator and present its implementation on a Lattice UP5K FPGA which attains a high throughput rate of 645-K samples per second for a single controller and a net throughput of 1032-K samples per second for an interleaved double controller at the expense of 20-mW of power consumption. Our single controller and interleaved double controller systems respectively achieve over 70x and 120x the performance of a similar sized microprocessor with comparable power constraints, and 5x the power efficiency compared to a large and more potent ARM Cortex-M4F capable of hardware floating-point operations. We obtain such a high performance with a systolic array design that uses simplified hardware floating-point operations that get synthesized on embedded DSP blocks of a low-power FPGA. Additionally, we support a simple treatment of complex reference signals such as sinusoidal inputs by storing the reference in an on-chip block RAM in the form of a time series. The level of power efficiency and high performance that we achieve on a small sized board is imperative for our target applications of micro, or insect-scale robotics.