Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter; Meadows, Lawrence; Arnemann, James; Shao, Lei; He, Siyu; Kärnä, Tuomas; Moise, Diana; Pennycook, Simon J; Maschhoff, Kristyn; Sewall, Jason; Kumar, Nalini; Ho, Shirley; Ringenburg, Michael F; Prabhat; Lee, Victor

doi:10.1109/sc.2018.00068

Download PDF

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

2018

Published Web Location

https://doi.org/10.1109/sc.2018.00068

Abstract

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel® Xeon Phi™ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩsubM/sub, σsub8/sub and nsubs/sub with unprecedented accuracy.

Main Content

For improved accessibility of PDF content, download the file to your device.

Lawrence Berkeley National Laboratory

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Published Web Location