Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Exploiting non-traditional parallelization for application performance and energy efficiency in parallel systems

Abstract

Multicore processors have become ubiquitous in today's computing platforms, extending from smartphones to data centers. However, exploiting the parallelism that they offer remains difficult, especially for legacy applications and applications with large serial components. Even many parallel applications fail to leverage the ample hardware parallelism and observe scalability limits. This creates a gap between the available hardware and the effective software parallelism. The scenario known as the parallelization wall impedes the performance growth that every processor generation used to bring in. The challenge, then, is to develop techniques that allow multiple cores to work in concert to accelerate a single thread. This dissertation proposes three such techniques -- software data spreading, inter-core prefetching, and load-balanced pipeline parallelism -- and evaluates them on state of the art real systems. These techniques are software only and exploit application level information to best utilize the underlying hardware. Software data spreading migrates a thread intelligently to spread the working set over the aggregate space from different private caches. This reduces expensive cache misses and dramatically improves performance along with energy efficiency when the working set fits in the aggregate cache space. Inter-core prefetching uses one or more helper threads to prefetch data in advance and uses thread migrations to access that data locally. This dissertation extends inter-core prefetching further and introduces two more techniques -- underclocked software prefetching and coalition threading. The former exploits the decoupled execution model of inter-core prefetching to save power. It applies dynamic frequency scaling on the helper thread to leverage its insensitivity to frequency and allows low frequency helper threads to bring the same performance benefits of high frequency helper threads. The latter technique, coalition threading, explores the potential of applying inter-core prefetching on top of traditional parallelism to improve scalability of parallel applications. Finally, this dissertation discusses load- balanced pipeline parallelism that analytically shows how to exploit loop level pipelining to its maximum potential

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View