Skip to main content
Open Access Publications from the University of California

LBL Publications

Lawrence Berkeley National Laboratory (Berkeley Lab) has been a leader in science and engineering research for more than 70 years. Located on a 200 acre site in the hills above the Berkeley campus of the University of California, overlooking the San Francisco Bay, Berkeley Lab is a U.S. Department of Energy (DOE) National Laboratory managed by the University of California. It has an annual budget of nearly $480 million (FY2002) and employs a staff of about 4,300, including more than a thousand students.

Berkeley Lab conducts unclassified research across a wide range of scientific disciplines with key efforts in fundamental studies of the universe; quantitative biology; nanoscience; new energy systems and environmental solutions; and the use of integrated computing as a tool for discovery. It is organized into 17 scientific divisions and hosts four DOE national user facilities. Details on Berkeley Lab's divisions and user facilities can be viewed here.

Cover page of Adaptively driven X-ray diffraction guided by machine learning for autonomous phase identification

Adaptively driven X-ray diffraction guided by machine learning for autonomous phase identification


Machine learning (ML) has become a valuable tool to assist and improve materials characterization, enabling automated interpretation of experimental results with techniques such as X-ray diffraction (XRD) and electron microscopy. Because ML models are fast once trained, there is a key opportunity to bring interpretation in-line with experiments and make on-the-fly decisions to achieve optimal measurement effectiveness, which creates broad opportunities for rapid learning and information extraction from experiments. Here, we demonstrate such a capability with the development of autonomous and adaptive XRD. By coupling an ML algorithm with a physical diffractometer, this method integrates diffraction and analysis such that early experimental information is leveraged to steer measurements toward features that improve the confidence of a model trained to identify crystalline phases. We validate the effectiveness of an adaptive approach by showing that ML-driven XRD can accurately detect trace amounts of materials in multi-phase mixtures with short measurement times. The improved speed of phase detection also enables in situ identification of short-lived intermediate phases formed during solid-state reactions using a standard in-house diffractometer. Our findings showcase the advantages of in-line ML for materials characterization and point to the possibility of more general approaches for adaptive experimentation.

Cover page of Reversal of spin-polarization near the Fermi level of the Rashba semiconductor BiTeCl

Reversal of spin-polarization near the Fermi level of the Rashba semiconductor BiTeCl


Spin–orbit coupling forms the physical basis for quantum materials with non-trivial topology and potential spintronics applications. The Rashba interaction is a textbook model of spin–orbit interactions, with charge carriers undergoing linear, isotropic spin-splitting in momentum space. Recently, non-centrosymmetric semiconductors in the family BiTeX (X = Cl, Br, I) have been identified as exemplary Rashba materials due to the strong splitting of their bulk bands, yet a detailed investigation of their spin textures, and their relationships to local crystal symmetry, is currently lacking. We perform high-efficiency spin-resolved photoemission spectroscopy to directly image the spin texture of surface states of BiTeCl, and we find dramatic deviations from idealized behavior, including a reversal of the spin-polarization near the Fermi level. We show that this behavior can be described by higher-order contributions to the canonical Rashba model with the surface states localized to individual trilayers of the crystal. Due to the prominence of these effects near the Fermi level, they should have a strong impact on the spin-dependent transport of carriers.

Defect engineering of silicon with ion pulses from laser acceleration


Defect engineering is foundational to classical electronic device development and for emerging quantum devices. Here, we report on defect engineering of silicon with ion pulses from a laser accelerator in the laser intensity range of 1019 W cm−2 and ion flux levels of up to 1022 ions cm−2 s−1, about five orders of magnitude higher than conventional ion implanters. Low energy ions from plasma expansion of the laser-foil target are implanted near the surface and then diffuse into silicon samples locally pre-heated by high energy ions from the same laser-ion pulse. Silicon crystals exfoliate in the areas of highest energy deposition. Color centers, predominantly W and G-centers, form directly in response to ion pulses without a subsequent annealing step. We find that the linewidth of G-centers increases with high ion flux faster than the linewidth of W-centers, consistent with density functional theory calculations of their electronic structure. Intense ion pulses from a laser-accelerator drive materials far from equilibrium and enable direct local defect engineering and high flux doping of semiconductors.

Cover page of Detecting lithium plating dynamics in a solid-state battery with operando X-ray computed tomography using machine learning

Detecting lithium plating dynamics in a solid-state battery with operando X-ray computed tomography using machine learning


Operando X-ray micro-computed tomography (µCT) provides an opportunity to observe the evolution of Li structures inside pouch cells. Segmentation is an essential step to quantitatively analyzing µCT datasets but is challenging to achieve on operando Li-metal battery datasets due to the low X-ray attenuation of the Li metal and the sheer size of the datasets. Herein, we report a computational approach, batteryNET, to train an Iterative Residual U-Net-based network to detect Li structures. The resulting semantic segmentation shows singular Li-related component changes, addressing diverse morphologies in the dataset. In addition, visualizations of the dead Li are provided, including calculations about the volume and effective thickness of electrodes, deposited Li, and redeposited Li. We also report discoveries about the spatial relationships between these components. The approach focuses on a method for analyzing battery performance, which brings insight that significantly benefits future Li-metal battery design and a semantic segmentation transferrable to other datasets.

Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysis


Large-scale high performance computing (HPC) systems typically consist of many thousands of CPUs and storage units used by hundreds to thousands of users simultaneously. Applications from large numbers of users have diverse characteristics, such as varying computation, communication, memory, and I/O intensity. A good understanding of the performance characteristics of each user application is important for job scheduling and resource provisioning. Among these performance characteristics, I/O performance is becoming increasingly important as data sizes rapidly increase and large-scale applications, such as simulation and model training, are widely adopted. However, predicting I/O performance is difficult because I/O systems are shared among all users and involve many layers of software and hardware stack, including the application, network interconnect, operating system, file system, and storage devices. Furthermore, updates to these layers and changes in system management policy can significantly alter the I/O behavior of applications and the entire system. To improve the prediction of the I/O performance on HPC systems, we propose integrating information from several different system logs and developing a regression-based approach to predict the I/O performance. Our proposed scheme can dynamically select the most relevant features from the log entries using various feature selection algorithms and scoring functions, and can automatically select the regression algorithm with the best accuracy for the prediction task. The evaluation results show that our proposed scheme can predict the write performance with up to 90% prediction accuracy and the read performance with up to 99% prediction accuracy using the real logs from the Cori supercomputer system at NERSC.

Cover page of Learning Gaussian graphical models with latent confounders

Learning Gaussian graphical models with latent confounders


Gaussian Graphical models (GGM) are widely used to estimate network structure in domains ranging from biology to finance. In practice, data is often corrupted by latent confounders which biases inference of the underlying true graphical structure. In this paper, we compare and contrast two strategies for inference in graphical models with latent confounders: Gaussian graphical models with latent variables (LVGGM) and PCA-based removal of confounding (PCA+GGM). While these two approaches have similar goals, they are motivated by different assumptions about confounding. In this paper, we explore the connection between these two approaches and propose a new method, which combines the strengths of these two approaches. We prove the consistency and convergence rate for the PCA-based method and use these results to provide guidance about when to use each method. We demonstrate the effectiveness of our methodology using both simulations and two real-world applications.

Cover page of Sensors show long-term dis-adoption of purchased improved cookstoves in rural India, while surveys miss it entirely

Sensors show long-term dis-adoption of purchased improved cookstoves in rural India, while surveys miss it entirely


User surveys alone do not accurately measure the actual use of improved cookstoves in the field. We present the results of comparing survey-reported and sensor-recorded cooking events, or durations of use, of improved cookstoves in two monitoring studies, in rural Maharashtra, India. The first was a free trial of the Berkeley-India Stove (BIS) provided to 159 households where we monitored cookstove usage for an average of 10 days (SD = 4.5) (termed “free-trial study”). In the second study, we monitored 91 households' usage of the BIS for an average of 468 days (SD = 153) after they purchased it at a subsidized price of about one third of the households' monthly income (termed “post-purchase study”). The studies lasted from February 2019 to March 2021. We found that 34% of households (n = 88) over-reported BIS usage in the free-trial study and 46% and 28% of households over-reported BIS usage in the first (n = 75) and second (n = 69) surveys of the post-purchase study, respectively. The average over-reporting in both studies decreased when households were asked about their usage in a binary question format, but this method provided less granularity. Notably, in the post-purchase study, sensors showed that most households dis-adopted the cookstove even though they purchased it with their own money. Surveys failed to detect the long-term declining trend in cookstove usage. In fact, surveys indicated that cookstoves’ adoption remained unchanged during the study. Households tended to report nominal responses for use such as 0, 7, or 14 cooking events per week (corresponding to 0, 1, or 2 times per day), indicating the difficulty of recalling exact days of use in a week. Additionally, we found that surveys may also provide misleading qualitative findings on user-reported cookstove benefits without the support of sensor data, causing us to overestimate impact. Some households with zero sensor-recorded usage reported cookstove fuel savings, quick cooking, and less smoke. These findings suggest that surveys may be unreliable or insufficient to provide solid foundational data for subsidies based on the ability of a stove to reduce damage to health or reduce emissions in real-world implementations.

Cover page of On data-driven energy flexibility quantification: A framework and case study

On data-driven energy flexibility quantification: A framework and case study


Building energy flexibility is an important resource for a sustainable and resilient power grid, and an important measure to reduce utility costs for building owners. Quantifying energy flexibility for existing buildings can provide critical insights in optimizing their operation. Data-driven methods for building energy modeling and analytics are gaining popularity due to the increasingly available sensor and meter infrastructure, affordable computational resources, and advanced modeling algorithms. However, their application in quantifying the energy flexibility of real buildings is still limited due to the heterogeneous data types and limited data availability. This study proposes a framework for building-level data-driven energy flexibility quantification that considers different levels of data availability and use cases. Two case studies with real building data collected at different scales were conducted to demonstrate the proposed framework for different purposes.