Skip to main content
eScholarship
Open Access Publications from the University of California

Fractal matrix multiplication : a case study on portability of cache performance

Abstract

In this paper we demonstrate the practical portability of a simple version of matrix multiplication designed to exploit maximal and predictable locality at all levels of the memory hierarchy, with no a priori knowledge of the specific organization of the memory system for any particular machine. We show that memory hierarchies portability does not sacrifice floating point performance, which is always a significant fraction of peak and, at least on one machine, is higher than ATLAS and vendor multiplication.

We present a proof of concept of the fact that the theoretical conclusions on locality exploitation yield practical implementations with the desired properties.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View