Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Democratizing Tensor Processors: Efficient and Generalized Tensor Computation with Architectural Support

No data is associated with this publication.
Creative Commons 'BY-NC-ND' version 4.0 license
Abstract

Tensor processors, notably matrix units (MXUs), have become indispensable in accelerating matrix operations for machine learning. However, their specialized design and limited support for varying data types and operators have hindered wider adoption. This dissertation tackles these limitations by enhancing the flexibility and capabilities of tensor processors across three key areas.

First, multi-mode matrix processing units (\MPCMXU{}) are introduced, capable of efficiently handling both IEEE 754 single-precision and complex 32-bit floating-point numbers. This innovation broadens the applicability of MXUs in scientific computing without requiring significant modifications to existing systems.

Second, \SIMDD{}, a novel programming paradigm and architecture, is proposed to extend MXU capabilities beyond matrix multiplications to a wider range of generalized matrix operations. By leveraging existing tensor processor infrastructure, \SIMDD{} offers substantial performance improvements over traditional approaches, further expanding the utility of these processors.

Finally, to address the challenges of memory-bound sparse tensor computations, a new compute dataflow, \underline{O}utput-stationary-\underline{E}lement-wise-\underline{I}nput-stationary (\OEI{}), and its corresponding architecture, SIDA, are presented. This combined approach exploits inter- and intra-operator reuse opportunities, significantly reducing memory traffic and enhancing the efficiency of tensor processors in sparse linear algebra workloads.

Main Content

This item is under embargo until January 19, 2025.