Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning

Abstract

Distributed deep learning framework should aim at high efficiency of training and inference ofdistributed exascale deep learning algorithms. There are three major challenges in this endeavor: scalability, adaptivity and efficiency. Any future framework will need to be adaptively utilized for a variety of heterogeneous hardware and network environments and will thus be required to be capable of scaling from single compute node up to large clusters. Further, it should be efficiently integrated into popular frameworks such as TensorFlow, PyTorch, etc.

This dissertation proposes a dynamically hybrid (hierarchy) distribution structure for distributed deep learning, taking advantage of flexible synchronization on both centralized and decentralized architectures, implementing multi-level fine-grain parallelism on distributed platforms. It is scalable as the number of compute nodes increases, and can also adapt to various compute abilities, memory structures and communication costs.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View