Search

Article
Peer Reviewed

Transforming Science Through Software: Improving While Delivering 100×

LBL Publications (2024)

Creative Commons 'BY' version 4.0 license

Article
Peer Reviewed

Optimizing MILC-Dslash Performance on NVIDIA A100 GPU: Parallel Strategies using SYCL

LBL Publications (2024)

MILC-Dslash is a benchmark that is derived from the MILC code which simulates lattice-gauge theory on a four-dimensional hypercube. This paper outlines a gradual progression in increasing the granularity of parallelism in the MILC-Dslash kernel using the SYCL programming model, transitioning from a simple to a fully parallel implementation. We explore the impact of various parallel strategies on the MILC-Dslash performance on an NVIDIA A100 GPU. This investigation encompasses different work-item index orders, work-group sizes, and memory access patterns that arise from these strategies. Examples of components intertwined with the parallel strategies include atomic memory operations, shared variables, divergent instructions, synchronization barrier, scenarios with and without dependencies between iterations, as well as versions with and without using the SYCL complex library (SyclCPLX) and the SYCLomatic tool. The best parallel strategy is twice as fast as the simplest strategy and shows a 10% improvement over the QUDA baseline, thanks to enhanced parallelism and the use of work-group local memory. This, along with other findings - such as optimizing GPU resource utilization even at the expense of concurrency, prioritizing the use of work-item indexing methods that favor more local-ized memory access patterns, and maximizing both the number of active work-items per warp and the sequence of successive active work-items - could provide valuable guidance for researchers and developers seeking to optimize parallel computing applications.

Cover page: Optimizing MILC-Dslash Performance on NVIDIA A100 GPU: Parallel Strategies using SYCL

Article
Peer Reviewed

Computing nucleon charges with highly improved staggered quarks

UCLA Previously Published Works (2021)

This work continues our program of lattice-QCD baryon physics using staggered fermions for both the sea and the valence quarks. We present a proof-of-concept study that demonstrates, for the first time, how to calculate baryon matrix elements using staggered quarks for the valence sector. We show how to relate the representations of the continuum staggered flavor-taste group SU(8)FT to those of the discrete lattice symmetry group. The resulting calculations yield the normalization factors relating staggered baryon matrix elements to their physical counterparts. We verify this methodology by calculating the isovector vector and axial-vector charges gV and gA. We use a single ensemble from the MILC Collaboration with 2+1+1 flavors of sea quark, lattice spacing a≈0.12 fm, and a pion mass Mπ≈305 MeV. On this ensemble, we find results consistent with expectations from current conservation and neutron beta decay. Thus, this work demonstrates how highly improved staggered quarks can be used for precision calculations of baryon properties and, in particular, the isovector nucleon charges.

Cover page: Computing nucleon charges with highly improved staggered quarks

Article
Peer Reviewed

High Energy Physics Exascale Requirements Review. An Office of Science review sponsored jointly by Advanced Scientific Computing Research and High Energy Physics, June 10-12, 2015, Bethesda, Maryland

NERSC (2023)

Article
Peer Reviewed

The anomalous magnetic moment of the muon in the Standard Model

UC Irvine Previously Published Works (2020)

We review the present status of the Standard Model calculation of the anomalous magnetic moment of the muon. This is performed in a perturbative expansion in the fine-structure constant α and is broken down into pure QED, electroweak, and hadronic contributions. The pure QED contribution is by far the largest and has been evaluated up to and including O(α5) with negligible numerical uncertainty. The electroweak contribution is suppressed by (mμ∕MW)2 and only shows up at the level of the seventh significant digit. It has been evaluated up to two loops and is known to better than one percent. Hadronic contributions are the most difficult to calculate and are responsible for almost all of the theoretical uncertainty. The leading hadronic contribution appears at O(α2) and is due to hadronic vacuum polarization, whereas at O(α3) the hadronic light-by-light scattering contribution appears. Given the low characteristic scale of this observable, these contributions have to be calculated with nonperturbative methods, in particular, dispersion relations and the lattice approach to QCD. The largest part of this review is dedicated to a detailed account of recent efforts to improve the calculation of these two contributions with either a data-driven, dispersive approach, or a first-principle, lattice-QCD approach. The final result reads aμSM=116591810(43)×10−11 and is smaller than the Brookhaven measurement by 3.7σ. The experimental uncertainty will soon be reduced by up to a factor four by the new experiment currently running at Fermilab, and also by the future J-PARC experiment. This and the prospects to further reduce the theoretical uncertainty in the near future – which are also discussed here – make this quantity one of the most promising places to look for evidence of new physics.

Cover page: The anomalous magnetic moment of the muon in the Standard Model