As parallel systems become ubiquitous, exploiting parallelism becomes crucial
for improving application performance. However, the complexities of developing
parallel software are major challenges. Shared memory parallel programming
models, such as OpenMP and Thread Building Blocks (TBBs), offer a single view
of the memory thereby making parallel programming easier. However, they
support limited forms of parallelism. Distributed memory programming models,
such as the Message Passing Interface (MPI), support more parallelism types;
however, their low level interfaces require great deal of programming effort.
This dissertation presents the SpiceC system that simplifies the task of
parallel programming while supporting different forms of parallelism and
parallel computing platforms. SpiceC provides easy to use directives to
express different forms of parallelism, including DOALL, DOACROSS, and
pipelining parallelism. SpiceC is based upon an intuitive computation model
in which each thread performs its computation in isolation from other
threads using its private space and communicates with other threads
via the shared space. Since, all data transfers between shared and
private spaces are explicit, SpiceC naturally supports both shared and
distributed memory platforms with ease.
SpiceC is designed to handle the complexities of real world applications.
The effectiveness of SpiceC is demonstrated both in terms of delivered
performance and the ease of parallelization for applications with the
following characteristics.
Applications that cannot be statically parallelized due to presence of
dependences, often contain large amounts of input dependent and dynamic
data level parallelism. SpiceC supports speculative parallelization
for exploiting dynamic parallelism with minimal programming effort.
Applications that operate on large data sets often make extensive use of
pointer-based dynamic data structures. SpiceC provides support for
partitioning dynamic data structures across threads and then distributing
the computation among the threads in a partition sensitive fashion.
Finally, due to large input sizes, many applications repeatedly perform
I/O operations that are interspersed with the computation. While traditional
approach is to execute loops contain I/O operations serially, SpiceC
introduces support for parallelizing computations in the presence
of I/O operations.
Finally, this dissertation demonstrates that SpiceC can handle the challenges
posed by the memory architectures of modern parallel computing platforms.
The memory architecture impacts the manner in which data transfers between
private and shared spaces are implemented. SpiceC does not place the
the burden of data transfers on the programmer. Therefore portability of SpiceC
to different platforms is achieved by simply modifying the handling of
data transfers by the SpiceC compiler and runtime. First, it is
shown how SpiceC can be targeted to shared memory architectures both with and
without hardware support for cache coherence. Next it is shown how accelerators
such as GPUs present in heterogeneous systems are exploited by SpiceC. Finally,
the ability of SpiceC to exploit the scalability of a distributed-memory system,
consisting of a cluster of multicore machines, is demonstrated.