Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

High-throughput Data Systems for Deep Learning Workloads

Abstract

Artificial Intelligence (AI) and Deep Learning (DL) have gained enormous popularity and have seen wide adoption across different domains. They ushered in an era of huge workloads that are increasingly computation- and data-intensive and put existing data analytics infrastructures and systems to the test. However, many of these workloads run with severe inefficiency and face tremendous scalability challenges due to suboptimal scheduling and poor resource/memory management, resulting in wasted computational and storage resources. Furthermore, DL workloads have been predominantly run on custom software frameworks far away from where most enterprise and operational data resides – databases and data systems. We realize that a significant gap exists between existing data systems and DL workloads. Data is often stored in the former but needs to be frequently exported. These large data movements between data and DL systems waste storage, network, and time. It also creates difficulties in data governance, provenance tracking, and compliance with data privacy regulations. Most importantly, large-scale data systems used to be at the center of data analytics before the recent takeover by DL, but many of the lessons and techniques would still apply to DL workloads, and there are missed opportunities to innovate upon existing infrastructures. This dissertation will focus on modern DL workloads and these system efficiency, scalability, and practicality challenges. We aim to raise the throughput of DL systems at a large data scale without sacrificing practicality using a central methodology of reimagining DL systems as DL data systems. On the one hand, we apply and innovate techniques inspired by database management systems, such as multi-query optimizations, query plan rewrites, and approximate processing, for DL workloads. On the other hand, we explore novel ways to extend existing data systems, without modifications to their core codebase, to run DL workloads, both bridging the gap and offering tangible data scalability benefits. All proposed techniques and systems are empirically tested and demonstrated to show improvements, sometimes over 10x, compared to state-of-the-art solutions.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View