Since multicore systems offer greater performance via parallelism, future computing is progressing towards use of machines with large number of cores. However, due to the complex interaction among characteristics of multithreaded applications, operating system policies, and architectural characteristics of multicore systems, delivering high performance on multicore systems is a challenging task. This dissertation addresses the above challenge by developing runtime techniques to achieve high performance when running a single multithreaded application as well as high system utilization and fairness when running multiple multithreaded applications. The runtime techniques are based on a simple monitoring system that captures important application characteristics and relevant architectural factors with negligible overhead.
To develop runtime techniques for achieving high performance when running a single multithreaded program on a multicore system, important performance limiting factors that impact the scalability of performance are identified. These factors include the threads configuration (i.e., the number of threads for a multithreaded program that provide the best speedup) and the thread scheduling and memory allocation polices employed. This dissertation presents two runtime techniques Thread Reinforcer, for dynamically determining appropriate threads configuration and Thread Tranquilizer, for dynamically selecting appropriate scheduling and memory allocation policies. By dynamically determining the appropriate threads configuration, scheduling policy, and memory allocation policy the performance of applications is maximized.
Lock contention is an important performance limiting factor for multithreaded programs on a multicore system. The dissertation presents two techniques Thread Shuffling and FaithFul Scheduling to limit the performance impact due to locks. Thread Shuffling reduces high lock acquisition latencies, resulting from the NUMA nature of a multicore system, via inter-CPU thread migrations. FaithFul Scheduling reduces the durations for which threads hold locks by minimizing lock holder thread preemptions through adaptive time-quanta allocations. These techniques significantly enhance the performance of applications in the presence of high lock contention.
Finally, this dissertation presents a coscheduling technique called ADAPT for achieving high system utilization and fairness when running multiple multithreaded applications on multicore systems. ADAPT uses supervised learning techniques for predicting the effects of interference between programs on their performance and adaptively schedules together programs that interfere with each other's performance minimally. It achieves high throughput, high system utilization, and fairness when running multiple multithreaded applications.