The proliferation of new Artificial Intelligence (AI) and Machine Learning (ML) accelerators has enhanced the performance of domain-specific applications with tightly integrated software stacks. However, this focus often overlooks other critical applications that could benefit from these unique architectures. This dissertation examines whether AI/ML applications fully utilize these architectures, proposes an alternative to tightly integrated software stacks, and presents a novel approach to evaluating accelerators for both domain-specific and broader applications through three bodies of work.
These three works collectively aim to expand the application domains of accelerators, benefiting a wide range of critical applications.
The first work presents TPUPoint, a profiling and optimization tool that assesses Google's Tensor Processing Units (TPUs). It addresses the issue of underutilized accelerators by classifying repetitive patterns into phases and identifying timing-critical operations within each phase. TPUPoint demonstrates that despite being designed for AI/ML, these accelerators may not be used to their full potential. Prompting the question of whether other applications outside AI/ML might better utilize these devices.
The second work, T2SP, seeks to overcome the limitation of accelerators restricted to specific software stacks. It focuses on achieving platform-agnostic tensor computations by combining Data Parallel C++ (DPC++) and T2X, a framework that separates functional specifications from spatial mappings for architectures like FPGAs and CGRAs. This approach ensures portability, efficient hardware utilization, and ease of development by allowing users to create implementations that are not confined to specific architectures.
The final work, Accel-Bench, is a benchmark suite designed to quantify the performance gains from using hardware-accelerated functions across various application domains, both within and outside AI/ML.Accel-Bench includes ten applications that utilize hardware-accelerated functions such as GEMM, CONV, and FFT. The suite shows that applications can achieve comparable or superior performance with hardware accelerators, even with increased computational complexity.
Together, these projects provide comprehensive solutions for evaluating performance, enabling portability, and diversifying applications across domains, advancing the field of hardware-accelerated computing.