Performance, Energy and Temperature Considerations for Job Scheduling and for Workload Distribution in Heterogeneous Systems
Many systems today are heterogeneous in that they consist of a mix of different types of processing units (e.g., CPUs, GPUs). Each of these processing units has different performance and energy consumption characteristics. Job scheduling and workload distribution play a crucial role in such systems as they strongly affect system’s performance, energy consumption, peak power and peak temperature. The scheduler maps the entire jobs to processing units, whereas workload distributor maps parts of the job. Allocating resources (e.g., core scaling, thread allocation) is another challenge since different sets of resources exhibit different behavior in terms of performance and energy.
Performance was the dominant factor in job scheduling and workload distribution for years. As processor’s design has hit the power-wall, energy consumption also becomes important. Many studies have been conducted on scheduling and workload distribution with an eye on performance improvement. However, few of them consider both performance and energy.
We propose a Performance, Energy and Thermal aware Resource Allocator and Scheduler (PETRAS), which includes core scaling and thread allocation. Since job scheduling is known to be an NP-hard problem, we apply a Genetic Algorithm (GA) to find an efficient job schedule in terms of performance and energy consumption, under peak power and peak CPU temperature constraints. Compared to other schedulers, PETRAS achieves up to 4.7x speedup and energy saving of up to 195%.
The classic workload distribution does not fully utilize the CPUs and the GPUs. It maps the sequential parts of a job to the CPU and the parallel parts to the GPU. We thus propose a Workload Distributor with a Resource Allocator (WDRA), which combines core scaling and thread allocation into a workload distributor. Since workload distribution is known to be an NP-hard problem, WDRA utilizes Particle Swarm Optimization (PSO) to find an efficient workload distribution in terms of performance and energy consumption, under peak power and peak CPU temperature constraints. Compared to other workload distributors, WDRA can achieve up to 1.47x speedup and 82% reduction of energy consumption. WDRA is a well-suited runtime distributor since it only takes up to 1.7% of the job’s execution time.