Data prefetching effectively reduces the negative effects of long
load latencies on the performance of modern processors. Hardware prefetchers
employ hardware structures to predict future memory addresses based on previous
patterns. Thread-based prefetchers use portions of the actual program code to
determine future load addresses for prediction. In this paper, we combine both
of these techniques to address the memory performance of pointer-based
applications. We combine a thread-based prefetcher, based on speculative
precomputation, with a pointer cache. The pointer cache is a new hardware
address predictor that tracks pointer transitions. Previously proposed
thread-based prefetchers are limited in how far they can run ahead of the main
thread in the face of recurrent dependent loads. When combined with the
pointer cache, a speculative thread can make better progress ahead of the main
thread, rapidly traversing data structures, despite pointer transition cache
misses. The pointer cache allows the consumers of a pointer load miss to issue
before the data actually arrives. Our results show that using a pointer cache
with speculative precomputation achieves a 65% speedup on average over a
speculative precompution architecture with a larger L3 cache.
Pre-2018 CSE ID: CS2002-0712