Skip to main content
Download PDF
- Main
Fractal matrix multiplication : a case study on portability of cache performance
Abstract
In this paper we demonstrate the practical portability of a simple version of matrix multiplication designed to exploit maximal and predictable locality at all levels of the memory hierarchy, with no a priori knowledge of the specific organization of the memory system for any particular machine. We show that memory hierarchies portability does not sacrifice floating point performance, which is always a significant fraction of peak and, at least on one machine, is higher than ATLAS and vendor multiplication.
We present a proof of concept of the fact that the theoretical conclusions on locality exploitation yield practical implementations with the desired properties.
Main Content
For improved accessibility of PDF content, download the file to your device.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Page Size:
-
Fast Web View:
-
Preparing document for printing…
0%