Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of Helping Faculty Teach Software Performance Engineering

Helping Faculty Teach Software Performance Engineering

(2024)

Over the academic year 2022–23, we discussed the teaching of software performance engineering with more than a dozen faculty across North America and beyond. Our outreach was centered on research-focused faculty with an existing interest in this course material. These discussions revealed an enthusiasm for making software performance engineering a more prominent part of a curriculum for computer scientists and engineers. Here, we discuss how MIT’s longstanding efforts in this area may serve as a launching point for community development of a software performance engineering curriculum, challenges in and solutions for providing the necessary infrastructure to universities, and future directions.

Cover page of Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.3

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.3

(2024)

This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler's own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.

AutoCT: Automated CT registration, segmentation, and quantification

(2024)

The processing and analysis of computed tomography (CT) imaging is important for both basic scientific development and clinical applications. In AutoCT, we provide a comprehensive pipeline that integrates an end-to-end automatic preprocessing, registration, segmentation, and quantitative analysis of 3D CT scans. The engineered pipeline enables atlas-based CT segmentation and quantification leveraging diffeomorphic transformations through efficient forward and inverse mappings. The extracted localized features from the deformation field allow for downstream statistical learning that may facilitate medical diagnostics. On a lightweight and portable software platform, AutoCT provides a new toolkit for the CT imaging community to underpin the deployment of artificial intelligence-driven applications.

A substitutional quantum defect in WS2 discovered by high-throughput computational screening and fabricated by site-selective STM manipulation

(2024)

Point defects in two-dimensional materials are of key interest for quantum information science. However, the parameter space of possible defects is immense, making the identification of high-performance quantum defects very challenging. Here, we perform high-throughput (HT) first-principles computational screening to search for promising quantum defects within WS2, which present localized levels in the band gap that can lead to bright optical transitions in the visible or telecom regime. Our computed database spans more than 700 charged defects formed through substitution on the tungsten or sulfur site. We found that sulfur substitutions enable the most promising quantum defects. We computationally identify the neutral cobalt substitution to sulfur (Co S0 ) and fabricate it with scanning tunneling microscopy (STM). The Co S0 electronic structure measured by STM agrees with first principles and showcases an attractive quantum defect. Our work shows how HT computational screening and nanoscale synthesis routes can be combined to design promising quantum defects.

A unifying perspective on non-stationary kernels for deeper Gaussian processes

(2024)

The Gaussian process (GP) is a popular statistical technique for stochastic function approximation and uncertainty quantification from data. GPs have been adopted into the realm of machine learning (ML) in the last two decades because of their superior prediction abilities, especially in data-sparse scenarios, and their inherent ability to provide robust uncertainty estimates. Even so, their performance highly depends on intricate customizations of the core methodology, which often leads to dissatisfaction among practitioners when standard setups and off-the-shelf software tools are being deployed. Arguably, the most important building block of a GP is the kernel function, which assumes the role of a covariance operator. Stationary kernels of the Matérn class are used in the vast majority of applied studies; poor prediction performance and unrealistic uncertainty quantification are often the consequences. Non-stationary kernels show improved performance but are rarely used due to their more complicated functional form and the associated effort and expertise needed to define and tune them optimally. In this perspective, we want to help ML practitioners make sense of some of the most common forms of non-stationarity for Gaussian processes. We show a variety of kernels in action using representative datasets, carefully study their properties, and compare their performances. Based on our findings, we propose a new kernel that combines some of the identified advantages of existing kernels.

Cover page of ExaWind: Open‐source CFD for hybrid‐RANS/LES geometry‐resolved wind turbine simulations in atmospheric flows

ExaWind: Open‐source CFD for hybrid‐RANS/LES geometry‐resolved wind turbine simulations in atmospheric flows

(2024)

Predictive high-fidelity modeling of wind turbines with computational fluid dynamics, wherein turbine geometry is resolved in an atmospheric boundary layer, is important to understanding complex flow accounting for design strategies and operational phenomena such as blade erosion, pitch-control, stall/vortex-induced vibrations, and aftermarket add-ons. The biggest challenge with high-fidelity modeling is the realization of numerical algorithms that can capture the relevant physics in detail through effective use of high-performance computing. For modern supercomputers, that means relying on GPUs for acceleration. In this paper, we present ExaWind, a GPU-enabled open-source incompressible-flow hybrid-computational fluid dynamics framework, comprising the near-body unstructured grid solver Nalu-Wind, and the off-body block-structured-grid solver AMR-Wind, which are coupled using the Topology Independent Overset Grid Assembler. Turbine simulations employ either a pure Reynolds-averaged Navier–Stokes turbulence model or hybrid turbulence modeling wherein Reynolds-averaged Navier–Stokes is used for near-body flow and large eddy simulation is used for off-body flow. Being two-way coupled through overset grids, the two solvers enable simulation of flows across a huge range of length scales, for example, 10 orders of magnitude going from O(μm) boundary layers along the blades to O(10 km) across a wind farm. In this paper, we describe the numerical algorithms for geometry-resolved turbine simulations in atmospheric boundary layers using ExaWind. We present verification studies using canonical flow problems. Validation studies are presented using megawatt-scale turbines established in literature. Additionally presented are demonstration simulations of a small wind farm under atmospheric inflow with different stability states.

Cover page of TRAVOLTA: GPU acceleration and algorithmic improvements for constructing quantum optimal control fields in photo-excited systems

TRAVOLTA: GPU acceleration and algorithmic improvements for constructing quantum optimal control fields in photo-excited systems

(2024)

We present an open-source software package, TRAVOLTA (Terrific Refinements to Accelerate, Validate, and Optimize Large Time-dependent Algorithms), for carrying out massively parallelized quantum optimal control calculations on GPUs. The TRAVOLTA software package is a significant overhaul of our previous NIC-CAGE algorithm and also includes algorithmic improvements to the gradient ascent procedure to enable faster convergence. We examine three different variants of GPU parallelization to assess their performance in constructing optimal control fields in a variety of quantum systems. In addition, we provide several examples with extensive benchmarks of our GPU-enhanced TRAVOLTA code to show that it generates the same results as previous CPU-based algorithms but with a speedup that is more than ten times faster. Our GPU enhancements and algorithmic improvements enable large quantum optimal control calculations that can be efficiently and routinely executed on modern multi-core computational hardware. Program summary: Program Title: TRAVOLTA CPC Library link to program files: https://doi.org/10.17632/grwppm37rn.1 Licensing provisions: GNU General Public License 3 Programming language: C++, openBLAS, and CUDA Supplementary material: Brief review of LU decomposition, raw numerical values used to generate Fig. 6 in the main text, and input examples for the TRAVOLTA software package. Nature of problem: The TRAVOLTA software package utilizes GPU accelerated routines and new algorithmic improvements to compute optimized electric fields that can drive a system from a known initial vibrational eigenstate to a specified final quantum state with a large (≈1) transition probability. Solution method: Quantum control, GPU acceleration, analytic gradients, Crank-Nicolson propagation, and gradient ascent optimization.

Cover page of Anthropogenic aerosols mask increases in US rainfall by greenhouse gases.

Anthropogenic aerosols mask increases in US rainfall by greenhouse gases.

(2024)

A comprehensive understanding of human-induced changes to rainfall is essential for water resource management and infrastructure design. However, at regional scales, existing detection and attribution studies are rarely able to conclusively identify human influence on precipitation. Here we show that anthropogenic aerosol and greenhouse gas (GHG) emissions are the primary drivers of precipitation change over the United States. GHG emissions increase mean and extreme precipitation from rain gauge measurements across all seasons, while the decadal-scale effect of global aerosol emissions decreases precipitation. Local aerosol emissions further offset GHG increases in the winter and spring but enhance rainfall during the summer and fall. Our results show that the conflicting literature on historical precipitation trends can be explained by offsetting aerosol and greenhouse gas signals. At the scale of the United States, individual climate models reproduce observed changes but cannot confidently determine whether a given anthropogenic agent has increased or decreased rainfall.

Cover page of The growing inadequacy of an open-ended Saffir-Simpson hurricane wind scale in a warming world.

The growing inadequacy of an open-ended Saffir-Simpson hurricane wind scale in a warming world.

(2024)

Global warming increases available sensible and latent heat energy, increasing the thermodynamic potential wind intensity of tropical cyclones (TCs). Supported by theory, observations, and modeling, this causes a shift in mean TC intensity, which tends to manifest most clearly at the greatest intensities. The Saffir-Simpson scale for categorizing damage based on the wind intensity of TCs was introduced in the early 1970s and remains the most commonly used metric for public communication of the level of wind hazard that a TC poses. Because the scale is open-ended and does not extend beyond category 5 (70 m/s windspeed or greater), the level of wind hazard conveyed by the scale remains constant regardless of how far the intensity extends beyond 70 m/s. This may be considered a weakness of the scale, particularly considering that the destructive potential of the wind increases exponentially. Here, we consider how this weakness becomes amplified in a warming world by elucidating the past and future increases of peak wind speeds in the most intense TCs. A simple extrapolation of the Saffir-Simpson scale is used to define a hypothetical category 6, and we describe the frequency of TCs, both past and projected under global warming, that would fall under this category. We find that a number of recent storms have already achieved this hypothetical category 6 intensity and based on multiple independent lines of evidence examining the highest simulated and potential peak wind speeds, more such storms are projected as the climate continues to warm.