Methods for multitasking among real-time embedded compute tasks running on the GPU
Published Web Locationhttps://doi.org/10.1002/cpe.4118
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the selected methods we have developed to implement our scheduling strategies. Next, we compare the performance of schedulers on various workloads running on Fermi and Kepler architectures and arrive at the following major conclusions: (1) Small kernels benefit from running kernels concurrently. (2) The combination of small kernels, high-priority kernels with longer runtimes, and lower-priority kernels with shorter runtimes benefits from a CPU scheduler that dynamically changes kernel order on the Fermi architecture. (3) Because of limitations of existing GPU architectures, currently CPU schedulers outperform their GPU counterparts. We also provide results and observations obtained from implementing and evaluating our schedulers on the NVIDIA Jetson TX1 system-on-chip architecture. We observe that although TX1 has the newer Maxwell architecture, the mechanism used for scheduler timings behaves differently on TX1 compared to Kepler leading to incorrect timings. In this paper, we describe our methods that allow us to report correct timings for CPU schedulers running on TX1. Finally, we propose new research directions involving the investigation of additional scheduling strategies.