Deep learning is a promising tool to determine
the physical model that describes our universe. To handle the
considerable computational cost of this problem, we present
CosmoFlow: a highly scalable deep learning application built
on top of the TensorFlow framework. CosmoFlow uses efficient
implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise
operations, to improve training performance on Intel® Xeon
Phi™ processors. We also utilize the Cray PE Machine Learning
Plugin for efficient scaling to multiple nodes.
We demonstrate fully synchronous data-parallel training on
8192 nodes of Cori with 77% parallel efficiency, achieving 3.5
Pflop/s sustained performance. To our knowledge, this is the
first large-scale science application of the TensorFlow framework
at supercomputer scale with fully-synchronous training. These
enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩM, σ8 and ns
with unprecedented accuracy.