Li, Wenyu

Development of Statistical and Deterministic Approaches to Uncertainty Quantification under Bound-to-Bound Data Collaboration

2021

Li, Wenyu
Advisor(s): Frenklach, Michael

Abstract

This dissertation is focused on developing methods under the uncertainty quantification framework of Bound-to-Bound Data Collaboration (B2BDC). The framework systematically combines models and data to validate the consistency of a dataset (i.e., the collection of data and models) and generates more predictive models. Uncertainties in model parameters and experimental data are characterized by deterministic bounds and propagated to bounds of prediction uncertainty, resonating ``Bound-to-Bound'' in the name. Solution mapping techniques are used to create polynomial and rational quadratic surrogate models, with which advanced optimization techniques can be implemented to compute provable bounds.

The bound-form uncertainty adopted in the B2BDC framework naturally generates inequality constraints on model parameters. The collection of all the constraints, derived from prior knowledge about model parameters and from requiring model prediction within experimental uncertainty, defines a region in the model parameter space termed the feasible set. The agreement/disagreement among models and data is determined by inspecting whether there exist parameter vectors in the feasible set: The dataset is consistent if the feasible set is not empty and inconsistent otherwise. Numerically, examination of dataset consistency is accomplished by calculating a quantity termed scalar consistency measure, defined as the solution to a constrained optimization problem, and evaluating its sign. The dataset is inconsistent if the scalar consistency measure is negative and otherwise if it is positive. Prediction uncertainty is computed by finding minimum and maximum values of the prediction models in the feasible set. The constrained optimization problem is nonconvex and difficult to solve globally in general as nonlinear optimization solvers usually converge to a local optimum. With quadratic, polynomial, and rational quadratic surrogate models, convex relaxation techniques are used to derive semidefinite programming problems whose solution can be computed efficiently and is a conservative bound on the solution of the original nonconvex problems. As a result, dataset inconsistency can be proved if the conservative bound of the scalar consistency measure is negative.

Starting from the bound-type uncertainty, more informative assumptions can be made to obtain more informative results. For example, if a prior distribution is selected to represent the prior uncertainty in model parameters and a likelihood function is selected to model the measurement error in the data, Bayesian inference produces a posterior distribution of the model parameters. Two physically inspired likelihood functions are investigated and compared in the thesis. For the likelihood functions and surrogate models considered in the thesis, the posterior distribution does not have a closed-form expression. Therefore, efficient sampling methods are developed to generate samples from the posterior distribution for further uncertainty quantification computations, for example, evaluating the uncertainty in model predictions.

A dataset can be inconsistent in practice, which implies that something is wrong with the system: the model (e.g., the model does not simulate the underlying process accurately), the data (e.g, a misreported measurement), or both. Efficient strategies for resolving dataset inconsistency are therefore necessary to use the framework. Methods motivated by physical reasoning can be advantageous since they may provide guidance on what factors cause the trouble. The vector consistency measure strategy aims to recover dataset consistency by relaxing the fewest experimental uncertainty bounds. Data points whose experimental uncertainty bounds are suggested to be changed by the solution vector are labelled as potentially suspicious to be examined further. Suspicious data points can also be determined by comparing their influence to the computed scalar consistency measure. A new method, suitable when the model is suspect, is developed in the dissertation that resolves dataset inconsistency by including a scenario-dependent discrepancy function, which modifies model output in a structured manner.

Syngas, a mixture of H$_2$ and CO, is a popular candidate for high efficiency power generation in hybrid turbines. A comprehensive and accurate knowledge of its combustion kinetics is necessary for designing optimal operating conditions in different applications. In one study, a B2BDC application to a syngas combustion dataset is carried out collaboratively by me and a research group in the German Aerospace Center. A syngas reaction mechanism is created by the German group with assessed uncertainty in the reaction rate parameters. A set of experimental measurements, including ignition delay times and laminar flame speeds, is collected by the German group with systematically assessed experimental uncertainty. I apply the B2BDC methods to the dataset and examine its feasible set to obtain a more predictive model of syngas combustion. In another study, I construct a different syngas combustion dataset including only ignition delay time measurements, using which the vector consistency measure method and the discrepancy function method are compared.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley

Development of Statistical and Deterministic Approaches to Uncertainty Quantification under Bound-to-Bound Data Collaboration