Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

System Software Support for FPGA-based Multi-Accelerator Architectures in Edge Computing Systems

Creative Commons 'BY' version 4.0 license
Abstract

Edge computing plays a key role in providing low latency and high availability services for emerging Internet of Things (IoT) applications. Recently, IoT devices are harnessing Deep Neural Networks (DNNs) to empower intelligence capability.With the increasing computing demands on IoT DNN applications, edge computing systems have evolved and adapted to employ hardware accelerators to enhance the processing power alongside the multi-core processors. Among hardware accelerators, Field Programmable Gate Arrays (FPGAs) have gained increased attention due to their reconfigurability, high performance, and power efficiency. The FPGA resource allows multiple workloads spatially co-locating and concurrently executing on the shared FPGA resource. With the performance variability due to application requirements and contention for the limited shared computing resources, the edge system poses significant challenges for efficiently scheduling and allocating resources for application tasks. In the absence of proper management, the system could turn to sub-optimal resource partitioning and utilization, and increase latency for completing applications. Therefore, modern edge systems require a flexible and efficient mechanism to dynamically partition shared resources and schedule tasks based on application requirements and available resources. However, in various IoT monitoring systems, sensing occurs at a fixed rate, and, hence, the data is sent to the edge for acceleration periodically. When multiple IoT devices continuously send acceleration requests to the edge, characteristics of applications in patterns can be observed. Such regularity in patterns provides optimization opportunities for the system to support multi-tenancy in sharing computing resources.

In the first part of the dissertation, I discuss the emerging edge computing systems that facilitated IoT end devices' compute-intensive tasks, i.e. DNNs, to be offloaded to the edge. I address the efforts to the integration with FPGAs and the deployment of DNNs on the resource-constrained edge. While efficiently managing task scheduling and resource allocation among various concurrent DNN applications co-locating on a multi-accelerator edge system becomes challenging, I discuss the current efforts made in the related resource management approaches for the multi-tenant FPGA-based edge system and their limitations.

Given that DNN applications often have similar or shared types of requirements, e.g. dataset types and accuracy, I focus on developing a DNN-Accelerator sharing system at the FPGA edge device, that serves various DNN applications from multiple end devices simultaneously. The proposed SharedDNN/PlanAhead policy exploits the regularity among requests for various DNN accelerators and determines which accelerator to allocate for each request and in what order to respond to the requests that achieve maximum responsiveness for a queue of acceleration requests.My proposed framework exploits a priori known pattern of input task arrivals and matches the suitable accelerators to the tasks according to the utilization of shared FPGA resources and application's requirements.

When multiple end devices are consistently sending tasks to the edge, there exist patterns of applications in the task queue on the edge. I, then, present a systematic approach that exploits the characteristics of applications in patterns and employs a mixed offline/online multi-queue scheduling method to optimize responsiveness by reducing response time and minimizing task drops for consistent IoT DNN workloads. The proposed framework not only exploits the regularity of the historical data and extracts patterns but also provides an adaptive online scheduler to mitigate the effect of noises and fluctuation due to network delays and system workload congestion.

Lastly, considering that IoT applications can be event-driven, characterized by varying task rates changing over time, the input workload experiences dynamic and uncertain changes. To dynamically adapt to uncertainty and changes in input workloads, I demonstrate a learning-based multi-accelerator management framework that asynchronously learns the scheduling and allocation policy and dynamically partitions shared resources to maximize system performance through interaction with the edge.

In this dissertation, I investigate the FPGA-based multi-accelerator management system software to schedule and allocate tasks onto FPGA edge systems in the presence of various IoT DNN workloads. The experimental results show significant improvements in response time and task drops by exploiting the regularity of input workloads and deploying the mixed offline/online-phase system software. In addition, when the workload experiences dynamic and uncertain changes where the regularity of workloads becomes unpredictable and hard to extract and generalize, I present a learning-based FPGA-based multi-accelerator management framework that allows the system to capture these dynamics and find an adaptive scheduling policy to accommodate the uncertainty. The experimental results show improved average throughput and task drop rates compared to other state-of-the-art heuristic and learning-based approaches.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View