Inverse problems, which involve identifying the system input(s) which produce desired output(s), encompass a wide class of tasks which are of great interest in scientific and engineering applications. Tackling such problems is challenging and requires leveraging all available information about the system, which in such applications typically consists of high-fidelity (HF) data, such as direct observations or experiments, which is expensive to obtain, and low-fidelity (LF) data, such as computer models or physics-based simulators, which is cheaper to obtain but is less accurate and often comes with additional parameters separate from the system inputs which must be tuned. A common approach to facilitate inverse problems in such contexts is multi-fidelity modeling (MFM), which comprises a wide variety of techniques that fuse data from multiple sources and may also calibrate (i.e., inversely estimate the tuning parameters of) the LF sources (we refer to these tasks collectively as data fusion). However, tackling inverse problems via modern MFM techniques is challenging and often requires the use of domain-specific frameworks. Additionally, existing MFM techniques are limited in the number of data sources they can incorporate or calibrate, the types of data they can process, and do not quantify uncertainty (which is especially important in inverse problems and in low-data contexts) or otherwise aid in system identification beyond mimicking the input-output behavior. In this dissertation, we develop data fusion approaches which address these gaps by leveraging manifold learning to facilitate data fusion and aid in uncertainty quantification and system identification. The contributions of this work comprise four research tasks which we briefly discuss below.
Existing data fusion approaches are typically limited in the number of sources they can accommodate and often do not provide tools for uncertainty quantification or system identification. My first contribution consists of developing data fusion via latent map Gaussian processes (LMGPs), which address this by recasting data fusion as a latent space learning problem and are applicable to a wide variety of problems. Relationships between data sources are learned and represented in a low-dimensional manifold which aids in learning, and provides a visualization of the relationships between the HF and LF sources, and enables handling an arbitrary number of data sources. The kernel is also extended to enable calibration.
My second contribution consists of demonstrating LMGP's abilities by inversely designing microstructures to be used for evaporative porous cooling, which is a challenging problem that balances competing objectives of fluid flow through pores and heat transfer through the solid phase. Microstructures are typically represented as voxel arrays which are too high-dimensional to design directly, and can be rendered at a range of sizes with cost and accuracy of property extraction increasing with resolution. To address this, we develop parameterizations via spectral density functions or SDFs which represent structures via a tractable number of parameters, and explore these parameters to generate microstructures at various resolutions which are then fused via LMGPs, providing a balance between cost and accuracy. The fitted LMGPs are then used in a multi-objective optimization loop to find a Pareto front of designs which maximize the objectives relevant to cooling performance.
While LMGPs are a powerful data fusion approach, the are based on GPs which do not scale well to large or high-dimensional data. My third contribution is the development of probabilistic neural data fusion or Pro-NDF, which employs the same manifold learning concepts behind LMGPs using neural networks (NNs) instead of GPs as they provide improved flexibility and the ability to scale to large data. Pro-NDF splits learning tasks into blocks which learn low-dimensional manifolds which represent the relationships between data sources and categorical variables. It employs probabilistic BNNs to learn the source manifold and represents the output as a parameterized distribution, quantifying and separating uncertainties, and is trained via a novel loss function incorporating a proper scoring rule. Pro-NDF's flexibility and scalability are demonstrated on analytic examples and real-world datasets.
Unlike LMGP, Pro-NDF only enables MFM and not calibration. I address this through my fourth contribution, which develops probabilistic neural calibration or Pro-NC which is an extension of Pro-NDF with greatly improved capabilities that allows simultaneous calibration of any number of LF sources. Pro-NDF employs manifolds to learn the relationships between data sources and categorical combinations as well as the effects of the calibration inputs on the LF sources. It represents learned calibration parameters and the output via distributions, again providing separable uncertainty estimates. This work also extends the loss function for Pro-NDF to enable calibration and develops an algorithm for weighting loss terms that facilitates accurate multi-task learning. Pro-NC is validated on a number of analytic examples, and will be applied to a complex material science problem in the future.