Imposing Minimal Memory Ordering on Multiprocessors
Shared memory has been widely adopted as the primary system level programming abstraction on modern multiprocessor systems due to its ease of programming. To assist programmers in understanding program behaviors with respect to read and write operations originating from multiple processors, many memory consistency models have been proposed. Sequential consistency (SC) memory model is the simplest and most intuitive model, but its strict memory ordering requirements can restrict many hardware and compiler optimizations that are possible in uniprocessors. For higher performance, many manufacturers typically choose to support relaxed consistency models. In these models, memory fence instructions are also provided to permit selective overriding of default relaxed memory access ordering, where strict ordering must be enforced for program correctness. However, memory fences are costly because they cause a processor to stall.
Although the underlying memory models or memory fence instructions require a set of memory orderings to be enforced, they are not always necessary at runtime. If reordering of two memory operations executed on one processor is not observed by any other processor, then the reordering is safe as the program will behave the same as if they were not reordered. Thus we observe that it is possible to relax some of memory orderings that are in general required but are unnecessary in the current execution. The goal of this dissertation is to dynamically identify the necessary memory orderings and thus relax the rest of the required memory orderings. The challenge of achieving this goal is to efficiently identify the necessary memory orderings.
This dissertation explores programming, compiler, and hardware support for eliminating unnecessary memory orderings to improve program performance. First, conflict ordering is proposed for implementing SC efficiently, where SC is enforced by explicitly ordering only conflicting memory operations in the global memory order, instead of all memory operations. The approach achieves SC performance comparable to RMO (relaxed memory ordering), without aggressive post-retirement speculation. Next, three approaches (i.e., scoped fence, conditional fence, and address-aware fence) are proposed to relax memory orderings imposed by memory fences, representing different levels of support from programmer, compiler and hardware. These approaches make memory fences lightweight to use.