Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of Constructing a Simulation Surrogate with Partially Observed Output

Constructing a Simulation Surrogate with Partially Observed Output

(2024)

Gaussian process surrogates are a popular alternative to directly using computationally expensive simulation models. When the simulation output consists of many responses, dimension-reduction techniques are often employed to construct these surrogates. However, surrogate methods with dimension reduction generally rely on complete output training data. This article proposes a new Gaussian process surrogate method that permits the use of partially observed output while remaining computationally efficient. The new method involves the imputation of missing values and the adjustment of the covariance matrix used for Gaussian process inference. The resulting surrogate represents the available responses, disregards the missing responses, and provides meaningful uncertainty quantification. The proposed approach is shown to offer sharper inference than alternatives in a simulation study and a case study where an energy density functional model that frequently returns incomplete output is calibrated.

Cover page of The MyESnet Portal: Making the Network Visible

The MyESnet Portal: Making the Network Visible

(2012)

ESnet provides a platform for moving large data sets and accelerating worldwide scientific collaboration. It provides high-bandwidth, reliable connections that link scientists at national laboratories, universities and other research institutions, enabling them to collaborate on some of the world's most important scientific challenges including renewable energy sources, climate science, and the origins of the universe. ESnet has embarked on a major project to provide substantial visibility into the inner-workings of the network by aggregating diverse data sources, exposing them via web services, and visualizing them with user-centered interfaces. The portal’s strategy is driven by understanding the needs and requirements of ESnet’s user community and carefully providing interfaces to the data to meet those needs. The 'MyESnet Portal ' allows users to monitor, troubleshoot, and understand the real time operations of the network and its associated services. This paper will describe the MyESnet portal and the process of developing it. The data for the portal comes from a wide variety of sources: homegrown systems, commercial products, and even peer networks. Some visualizations from the portal are presented highlighting some interesting and unusual cases such as power consumption and flow data. Developing effective user interfaces is an iterative process. When a new feature is released, users are both interviewed and observed using the site. From this process valuable insights were found concerning what is important to the users and other features and services they may also want. Open source tools were used to build the portal and the pros and cons of these tools are discussed

Sequential Bayesian experimental design for calibration of expensive simulation models

(2023)

Simulation models of critical systems often have parameters that need to be calibrated using observed data. For expensive simulation models, calibration is done using an emulator of the simulation model built on simulation output at different parameter settings. Using intelligent and adaptive selection of parameters to build the emulator can drastically improve the efficiency of the calibration process. The article proposes a sequential framework with a novel criterion for parameter selection that targets learning the posterior density of the parameters. The emergent behavior from this criterion is that exploration happens by selecting parameters in uncertain posterior regions while simultaneously exploitation happens by selecting parameters in regions of high posterior density. The advantages of the proposed method are illustrated using several simulation experiments and a nuclear physics reaction model.

Cover page of Numerical evidence against advantage with quantum fidelity kernels on classical data

Numerical evidence against advantage with quantum fidelity kernels on classical data

(2023)

Quantum machine learning techniques are commonly considered one of the most promising candidates for demonstrating practical quantum advantage. In particular, quantum kernel methods have been demonstrated to be able to learn certain classically intractable functions efficiently if the kernel is well aligned with the target function. In the more general case, quantum kernels are known to suffer from exponential "flattening"of the spectrum as the number of qubits grows, preventing generalization and necessitating the control of the inductive bias by hyperparameters. We show that the general-purpose hyperparameter-tuning techniques proposed to improve the generalization of quantum kernels lead to the kernel becoming well approximated by a classical kernel, removing the possibility of quantum advantage. We provide extensive numerical evidence for this phenomenon utilizing multiple previously studied quantum feature maps and both synthetic and real data. Our results show that unless novel techniques are developed to control the inductive bias of quantum kernels, they are unlikely to provide a quantum advantage on classical data that lacks special structure.

Derivative-free optimization of a rapid-cycling synchrotron

(2023)

We develop and solve a constrained optimization model to identify an integrable optics rapid-cycling synchrotron lattice design that performs well in several capacities. Our model encodes the design criteria into 78 linear and nonlinear constraints, as well as a single nonsmooth objective, where the objective and some constraints are defined from the output of Synergia, an accelerator simulator. We detail the difficulties of optimizing within the 32-dimensional, simulation-constrained decision space and establish that the space is nonempty. We use a derivative-free manifold sampling algorithm to account for structured nondifferentiability in the objective function. Our numerical results quantify the dependence of approximate solutions on constraint parameters and the effect of the form of objective function.

Adaptive sampling quasi-Newton methods for zeroth-order stochastic optimization

(2023)

We consider unconstrained stochastic optimization problems with no available gradient information. Such problems arise in settings from derivative-free simulation optimization to reinforcement learning. We propose an adaptive sampling quasi-Newton method where we estimate the gradients using finite differences of stochastic function evaluations within a common random number framework. We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the stochastic approximations and provide global convergence results to the neighborhood of a locally optimal solution. We present numerical experiments on simulation optimization problems to illustrate the performance of the proposed algorithm. When compared with classical zeroth-order stochastic gradient methods, we observe that our strategies of adapting the sample sizes significantly improve performance in terms of the number of stochastic function evaluations required.

DeepAstroUDA: semi-supervised universal domain adaptation for cross-survey galaxy morphology classification and anomaly detection

(2023)

Artificial intelligence methods show great promise in increasing the quality and speed of work with large astronomical datasets, but the high complexity of these methods leads to the extraction of dataset-specific, non-robust features. Therefore, such methods do not generalize well across multiple datasets. We present a universal domain adaptation method, DeepAstroUDA, as an approach to overcome this challenge. This algorithm performs semi-supervised domain adaptation (DA) and can be applied to datasets with different data distributions and class overlaps. Non-overlapping classes can be present in any of the two datasets (the labeled source domain, or the unlabeled target domain), and the method can even be used in the presence of unknown classes. We apply our method to three examples of galaxy morphology classification tasks of different complexities (three-class and ten-class problems), with anomaly detection: (1) datasets created after different numbers of observing years from a single survey (Legacy Survey of Space and Time mock data of one and ten years of observations); (2) data from different surveys (Sloan Digital Sky Survey (SDSS) and DECaLS); and (3) data from observing fields with different depths within one survey (wide field and Stripe 82 deep field of SDSS). For the first time, we demonstrate the successful use of DA between very discrepant observational datasets. DeepAstroUDA is capable of bridging the gap between two astronomical surveys, increasing classification accuracy in both domains (up to 40 % on the unlabeled data), and making model performance consistent across datasets. Furthermore, our method also performs well as an anomaly detection algorithm and successfully clusters unknown class samples even in the unlabeled target dataset.