The Perron–Frobenius and Koopman operators provide natural dual settings to investigate the dynamics of complex systems. In this thesis we focus on certain pertinent concepts and strategies for obtaining dynamical models and approximating the transfer operators from data.
First, we explain the setting and the assumptions that underlie the so-called Dynamic Mode Decomposition (DMD); this methodology relates to Koopman-operator approximation through full-state observables. The goal is to highlight caveats as well as to suggest metrics that indicate that the use of DMD on specific dataset is warranted. In many applications it is often the case that only a limited number of data samples is available for modeling an otherwise exceedingly high dimensional process. The dimensionality of the process, which may represent visual or distributional fields, in conjunction with the limited observation record requires careful analysis. It is precisely this regime of ``small data,'' i.e., ``few samples,'' that has been a challenge in traditional signal analysis since its inception. DMD is a recent development that aims to identify suitable linear dynamics that can explain the data.
We show how the concept of the gap metric can be used as a tool to quantify how subspaces spanned by data impact modeling assumptions. Also, the gap metric provides guidance in selecting appropriate dimensionality for models for such processes.
Next, we formulate and solve a regression problem with time-stamped distributional data.Distributions are considered as points in the Wasserstein space of probability measures, metrized by the $2$-Wasserstein metric, and may represent images, power spectra, point clouds of particles, and so on. The data sets may be thought to represent densities of particles whose precise trajectories are not be available (e.g., partially observed).
The regression seeks a curve in the Wasserstein space that passes closest to the dataset.
Our regression problem allows utilizing general curves in a Euclidean setting (linear, quadratic, sinusoidal, and so on), lifted to corresponding measure-valued curves in the Wasserstein space. It represents a relaxation of geodesic regression in Wasserstein space. The apparently nonlinear primal problem can be recast as a multi-marginal optimal transport, leading to a formulation as a linear program. Entropic regularization and a generalized Sinkhorn algorithm can be effectively employed to solve this multi-marginal problem.
The proposed framework can be used to estimate correlation between given distributional snapshots. Potential applications of the theory are envisioned to aggregate data inference, estimating meta-population dynamics, power spectra tracking, and more generally, system identification.
Finally, we introduce a regression-type formulation for approximating the Perron-Frobenius operator by relying on distributional snapshots of data. %These snapshots may represent densities of particles. The Wasserstein metric is leveraged to define a suitable functional optimization in the space of distributions, weighing in distances between successive distributional snapshots. The formulation allows seeking suitable dynamics so as to interpolate the distributional flow in function space. A first-order necessary condition for optimality is derived and utilized to construct a gradient flow approximating algorithm. It should be noted that we assume no information on statistical dependence between successive pairs of distributions. The method extends to search for nonlinear dynamics assuming a suitable parametrization of the nonlinear state transition map in terms of selected basis functions.