On Quality-of-Service and Memory Efficiency in Heterogeneous MPSoCs
Due to their energy efficiency, heterogeneous Multi-Processor Systems-on-Chip (MPSoCs) are widely deployed as the engines for modern-day smart devices. As smart devices prevail, heterogeneous MPSoCs have become ubiquitous in our daily lives. These MPSoCs typically integrate a diverse collection of cores, including real-time agents such as the GPU, the DSP and video codec, as well as general-purpose cores such as the CPU. Different from traditional multi-core processors where memory sharing is mostly handled by the cache hierarchy, a heterogeneous system mainly uses DRAM as the medium for data sharing. That makes the memory subsystem, including networks-on-chip (NoC), last-level caches and memory controllers, a frequently shared resource among heterogeneous cores.
As traffic streams travel through the shared memory subsystem, memory interference is inevitable, which is even worsened by their disparate traffic patterns. As a result, the memory subsystem plays a crucial role in heterogeneous systems and has tremendous impact on system performance.
In particular, the design goal of the memory subsystem is two-fold: delivering end-to-end Quality-of-Service (QoS) to heterogeneous cores and improving memory efficiency in DRAM.
In this dissertation, we explore several design aspects of the memory subsystem for heterogeneous MPSoCs. First, we propose a self-aware resource allocation framework to enable distributed performance monitoring, priority-based adaptation and instant memory response. The proposed framework meets a diverse range of QoS demands from real-time cores. It can also configure itself in runtime to accommodate best-effort cores and real-time cores that are particularly prone to QoS failure. In the rest of the dissertation, we further look into each major memory fabric. We present the single-tier virtual queuing memory controller that maintains a single tier of request queues and employs an efficacious scheduler that considers both QoS requirements and DRAM bank states. Then we study the orchestration between last-level cache controller and memory scheduler to improve memory visibility and avoid unnecessary precharges and row activations in DRAM. At last, we introduce a locality-aware NoC router to prevent row-buffer locality dilution while providing QoS-aware service to heterogeneous cores.