Zheng, Lianmin

This item is not available for download from eScholarship

Scalable and Efficient Systems for Large Deep Learning Models

2024

No data is associated with this publication.

Abstract

Recent advancements in machine learning have primarily been driven by large-scale deep learning models, particularly large language models. The large scale and new capabilities of these models present challenges in designing infrastructure systems to support their entire lifecycle, from training and serving to evaluation. To meet the high computational and memory requirements of these models, while fully utilizing and accurately evaluating their capabilities, we need to redesign many system components, such as compilers, distributed computing platforms, programming systems, and evaluation methods.

In this dissertation, we introduce a suite of systems designed and built to support large models, covering training, serving, and evaluation phases. First, we discuss Alpa, a system for large-scale model-parallel training, which automatically generates distributed execution plans integrating both inter- and intra-operator parallelism. Moving on to serving, we introduce Ansor, a compiler that produces high-performance implementations of tensor programs for various hardware backends. We also explore SGLang, a system for deploying large language models that includes both a flexible front-end programming interface and an optimized back-end runtime for fast inference. Lastly, in the evaluation phase, we detail our efforts in model evaluation, which include Chatbot Arena, a crowdsourced live benchmark platform, and LLM-as-a-Judge, an automated evaluation pipeline. These tools collectively form a full-stack system for the continuous improvement of large models.

Main Content

UC Berkeley

Scalable and Efficient Systems for Large Deep Learning Models

This item is under embargo until September 27, 2026.