Real-time, low-energy constraints as well as large amounts of data continue to challenge high performance computing (HPC). As a result, it has become increasingly important to advance the capabilities of high performance architectures. Single instruction multiple data (SIMD) designs are ideal for targeting data- and compute-intensive HPC workloads. Accelerator-rich architectures, in particular, implement application-specific functionality directly in hardware via on-chip accelerators, providing many orders of magnitude improvement in power efficiency and performance. Unlike instruction-based SIMD architectures, such as graphics processing units (GPUs), accelerator-rich designs avoid the overhead for processing instructions while maintaining flexibility by way of accelerator composition and virtualization.
This dissertation explores various aspects of hardware-based acceleration, including fine-grained vs. coarse-grained designs, ASIC-based vs. FPGA-based implementations, and domain-specific vs. domain-adaptive systems. While accelerator-rich designs are well-suited for exploiting data-level parallelism, they are highly susceptible (as are all SIMD architectures) to performance degradation due to divergence in control flow. Since HPC workloads can contain various types and amounts of control flow, this SIMD divergence issue must be addressed in order for accelerator-based designs to yield more effective HPC platforms. As such, this work also investigates an approximation-based approach for eliminating control flow. We exploit the intelligent learning capabilities of neural networks to approximate and regularize the control flow regions of applications, thereby trading off precision for performance gains. Furthermore, we develop light-weight checks to ensure output reliability at runtime, allowing our neural-network-based approximations to be leveraged in a dynamically adaptive fashion.
Our work culminates in the following hybrid approach: a heterogeneous SIMD platform with both precise (conventional) and approximate (neural) accelerators, which are managed using online error control mechanisms. For the implementation, ASIC components are incorporated into the platform; also, approximation control methods, including NN training tools, static software interfaces, and dynamic hardware components, are developed to maintain acceptable error rates. Taking inspiration from the partial-observability and stochasticity of the world around us, this work combines data-parallel acceleration with neural approximation in an effort to advance high performance computation.