General-purpose Graphics Processing Units (GPUs) have been considered as a promising technology to address the high computational demands of real-time data-intensive applications. Many of today's embedded processors already provide on-chip GPUs, the use of which can greatly help satisfy the timing challenges of data-intensive tasks by accelerating their executions. However, the current state-of-the-art GPU management in real-time systems still lacks properties required for efficient and certifiable real-time GPU computing. For example, existing real-time systems sequentially execute GPU workloads to guarantee predictable GPU access time, which significantly underutilizes the GPU and exacerbates temporal dependency among the workloads.
In this research, we propose a spatial-temporal GPU management framework for real-time cyber-physical systems. Our proposed framework explicitly manages the allocation of GPU's internal execution engines. This approach allows multiple GPU-using tasks to simultaneously execute on the GPU, thereby improving GPU utilization and reducing response time. Also, it can improve temporal isolation by allocating a portion of the GPU execution engines to tasks for their exclusive use. We have implemented a prototype of the proposed framework for a CUDA environment. The case study using this implementation on two NVIDIA GPUs, GeForce 970 and Jetson TX2, shows that our framework reduces the response time of GPU execution segments in a predictable manner, by executing them in parallel. Experimental results with randomly-generated tasksets indicate that our framework yields a significant benefit in schedulability compared to the existing approach.