Skip to main content
eScholarship
Open Access Publications from the University of California

ORNL Cray X1 evaluation status report

  • Author(s): Agarwal, P.K.
  • Alexander, R.A.
  • Apra, E.
  • Balay, S.
  • Bland, A.S
  • Colgan, J.
  • D'Azevedo, E.F.
  • Dongarra, J.J.
  • Dunigan Jr., T.H.
  • Fahey, M.R.
  • Fahey, R.A.
  • Geist, A.
  • Gordon, M.
  • Harrison, R.J.
  • Kaushik, D.
  • Krishnakumar, M.
  • Luszczek, P.
  • Mezzacappa, A.
  • Nichols, J.A.
  • Nieplocha, J.
  • Oliker, L.
  • Packwood, T.
  • Pindzola, M.S.
  • Schulthess, T.C.
  • Vetter, J.S.
  • White III, J.B.
  • Windus, T.L.
  • Worley, P.H.
  • Zacharia, T.
  • Zacharia, T.
  • et al.
Abstract

On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. "This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership," said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of the architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the same number of processors. - Molecular dynamics simulations related to the phenomenon of photon echo run 8 times faster than previously achieved. Even at 256 processors, the Cray X1 system is already outperforming other supercomputers with thousands of processors for a certain class of applications such as climate modeling and some fusion applications. This evaluation is the outcome of a number of meetings with both high-performance computing (HPC) system vendors and application experts over the past 9 months and has received broad-based support from the scientific community and other agencies.

Main Content
Current View