Search

Scholarly Works (6 results)

Sort By:

Peer Reviewed

Many Cores for the Masses: Lessons Learned from Application Readiness Efforts at NERSC for the Knights Landing based Cori System

LBL Publications (2016)

Intel HPC Developers Conference, Salt Lake City, UT

Cover page: Many Cores for the Masses: Lessons Learned from Application Readiness Efforts at NERSC for the Knights Landing based Cori System

Article
Peer Reviewed

Fusion PIC Code Performance Analysis on The Cori KNL System

LBL Publications (2018)

We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization is the most beneficial optimization path and can theoretically yield up to 8x speedup on KNL, but is in practice limited by the data layout to 4x.

Cover page: Fusion PIC Code Performance Analysis on The Cori KNL System

Article
Peer Reviewed

A molecular-MNIST dataset for machine learning study on diffraction imaging and microscopy

LBL Publications (2020)

An image dataset of 10 different size molecules, where each molecule has 2,000 structural variants, is generated from the 2D cross-sectional projection of Molecular Dynamics trajectories. The purpose of this dataset is to provide a benchmark dataset for the increasing need of machine learning, deep learning and image processing on the study of scattering, imaging and microscopy.

$Cover page: A molecular-MNIST dataset for machine learning study on diffraction imaging and microscopy$

Article
Peer Reviewed

Fusion PIC code performance analysis on the Cori KNL system

LBL Publications (2017)

We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization is shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.

Article
Peer Reviewed

International Workshop on Open MP (IWOMP)

UCLA Previously Published Works (2023)

We investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelize the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. A large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.

Article
Peer Reviewed

Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies.

LBL Publications (2017)