Democratizing Tensor Processors: Efficient and Generalized Tensor Computation with Architectural Support
- Zhang, Yunan
- Advisor(s): Tseng, Hung-Wei
Abstract
Tensor processors, notably matrix units (MXUs), have become indispensable in accelerating matrix operations for machine learning. However, their specialized design and limited support for varying data types and operators have hindered wider adoption. This dissertation tackles these limitations by enhancing the flexibility and capabilities of tensor processors across three key areas.
First, multi-mode matrix processing units (\MPCMXU{}) are introduced, capable of efficiently handling both IEEE 754 single-precision and complex 32-bit floating-point numbers. This innovation broadens the applicability of MXUs in scientific computing without requiring significant modifications to existing systems.
Second, \SIMDD{}, a novel programming paradigm and architecture, is proposed to extend MXU capabilities beyond matrix multiplications to a wider range of generalized matrix operations. By leveraging existing tensor processor infrastructure, \SIMDD{} offers substantial performance improvements over traditional approaches, further expanding the utility of these processors.
Finally, to address the challenges of memory-bound sparse tensor computations, a new compute dataflow, \underline{O}utput-stationary-\underline{E}lement-wise-\underline{I}nput-stationary (\OEI{}), and its corresponding architecture, SIDA, are presented. This combined approach exploits inter- and intra-operator reuse opportunities, significantly reducing memory traffic and enhancing the efficiency of tensor processors in sparse linear algebra workloads.