The Cori system at NERSC has two compute
partitions with different CPU architectures: a 2,004 node
Haswell partition and a 9,688 node KNL partition, which
ranked as the 5th most powerful and fastest supercomputer
on the November 2016 Top 500 list. The compute partitions
share a common storage configuration, and understanding the
IO performance gap between them is important, impacting
not only to NERSC/LBNL users and other national labs, but
also to the relevant hardware vendors and software developers.
In this paper, we have analyzed performance of single core
and single node IO comprehensively on the Haswell and KNL
partitions, and have discovered the major bottlenecks, which
include CPU frequencies and memory copy performance. We
have also extended our performance tests to multi-node IO
and revealed the IO cost difference caused by network latency,
buffer size, and communication cost. Overall, we have developed
a strong understanding of the IO gap between Haswell and KNL
nodes and the lessons learned from this exploration will guide
us in designing optimal IO solutions in many-core era.