Lu, Qiujing

Real-world High-dimensional Data with Multimodal Distributions: Mode Discovery and Mode-preserving Generative Models

2022

Lu, Qiujing
Advisor(s): Roychowdhury, Vwani P

Abstract

Recent advances in Artificial Intelligence (AI) have demonstrated extraordinary performances in tasks relating to our daily life activities. However, the “intelligence” of the machine is still far below our expectations, especially when it concerns the ability to model the observable world in a reliable and predictive fashion. This is a challenging task, as real world data is not only high dimensional but has distributions with multiple modes and different modalities. Identifying such hidden modes and developing succinct generative models that can generalize to unseen scenarios are difficult problems. If one can solve such versatile data modeling problems, then one could take effective actions to achieve desired goals, enabling AI to display true cognitive abilities.In this thesis, we model real world high-dimensional data as having an inherently low dimensional generative mechanism: a set of computational laws – indexed by a hierarchical mixture of contexts– project low-dimensional data into the observable high-dimensional multi-modal data sets. While the ideal goal of reverse engineering such a universal generative mechanism –purely based on data– is a seemingly impossible task, a spectrum of techniques are available for handling cases where partial information about the underlying contexts and modes are given. The objective of the thesis is to address the problem of discovering modes and characterizing them along this spectrum of prior information, and demonstrate the effectiveness of our techniques in several real world applications. For example, as a step towards discovering low-dimensional modes of high-dimensional data, we propose an efficient and novel algorithm called eNMF for the non-negative matrix factorization (NMF) problem, which is used widely for interpretable dimensionality reduction tasks. Then we address a problem to refine the modes in the Electroencephalogram (EEG) data where ground truth labels – as assumed in a strictly supervised approach – are not available and only partial and noisy labels can be inferred. A class of events in EEG data, referred to as High-frequency oscillations (HFOs), have been found to be a promising biomarker of the epileptogenic zone in the brains of patients with epilepsy. HFOs, however, can be generated by healthy tis- sues as well, and further differentiation of the HFOs into epileptogenic HFOs (eHFOs) and non-eHFOs is needed for more accurate localization of the problematic zone. Going further along the spectrum, we address the domain of creating generative models when only the high-level contexts are given and demonstrate that it facilitates the fine-level discovery of modes within the given contexts. In particular, we construct context-aware 3D human motion generation models that enable on-demand sampling of discovered and interpolated modes within different categories of action types, and customization of motion trajectories not present in the training data. Finally, we tackle the problem of creating generative models for scenarios where the observable data is created by an agent that interacts sequentially with real physical environments. We show how one can create generative models that can not only imitate diverse behavioral data, but can do so robustly while interacting with an external environment.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Real-world High-dimensional Data with Multimodal Distributions: Mode Discovery and Mode-preserving Generative Models