Reliability and Timing Aware GPU Management on Embedded Systems
The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. In order to satisfy these demands, the semiconductor technology has been scaled down and multi/many-core processors have been proposed. Among the multi/many-core processors, Graphics Processing Units (GPUs) have been employed in the critical path of applications due to its programmability, high-performance, and low power consumption. Moreover, state-of-the-art GPUs have the capability to process multiple GPU workloads concurrently. Therefore, GPUs have been considered to be an essential part of embedded systems because of the increased number of throughput-oriented applications on real-time embedded systems, such as autonomous driving and advanced driving assistant applications. However, there are several challenges for using the GPUs in embedded systems.
First, due to the small feature size, the state-of-the-art nano-scale multi-core processors, including GPUs, has faced severe reliability challenges like soft-error and processor degradation. Next, there is a noticeable (die-to-die and within-die) parameter variation due to the advanced semiconductor technology. Therefore, the lifetime and workload management of embedded GPUs under process variation is considered one of the most important aspects to ensure functional correctness over a long period of time. Last, existing application scheduling frameworks on a GPU do not have enough flexibility to handle the dynamic behavior of multiple event-driven applications.
In order to tackle the above mentioned challenges, in this thesis, we propose a reliability and timing aware workload management framework on GPU-based real-time embedded systems. The proposed framework consists of two parts: design-time and run-time workload management. The proposed design-time workload management unit analyzes GPU kernel functions and generates PTX instruction schedules that maximizes the soft-error reliability. At the same time, the application profiles are generated for run-time workload management. The proposed run-time workload management unit includes two parts: Streaming Multiprocessor (SM) scheduling unit and aging-aware workload distribution unit. During run-time, depending on the system status and requirements, the proposed scheduling unit partitions the GPU workloads into sub-workload and generates sub-workloads launch sequences to handle the dynamic behavior of the event-driven applications. Concurrently, in the SM, the proposed aging-aware workload distribution unit jointly considers the current aging status and the process variation status and distributes the workload across the SM to maximize the lifetime of the GPU.