Scalable Measure Transportation and Applications in Machine Learning and Human Computer Interfaces
The field of optimal transportation is a broad area of theory pertaining to the computation of a mapping to transform one probability distribution into another. Many computational strategies solving the transport problem spanning decades of research have led to several interesting applications in areas such as statistics, economics, machine learning and computer science. However, there also exist many limitations to these algorithms, including the ever-present difficulty of scaling to larger datasets, both with respect to dimension and number of available samples from which we would like to extract useful information. Furthermore, although we have seen a dramatic increase in the availability of powerful distributed computational resources throughout the past few decades, such as publicly available CPU clusters and GPU compute nodes, many modern algorithmic approaches to the transport problem are not specifically designed with parallelism in mind to accommodate such frameworks. In a world where the continuous interaction between human users and distributed computational resources drives our technologically modern lives, this capability is critical, especially with applications that are expected to work in real-time.
Building upon previous research from our group, this work investigates a parallelized, computational problem to create optimal transport maps that is designed with the notion of remote scalability in mind; this work focuses on a CUDA-based implementation of the framework that utilizes GPU computational resources to accommodate analysis of data in higher dimensions not often seen in the field of computational optimal transportation, and that furthermore scale with the availability of additional hardware. We will also present applications using this framework in several different areas of machine learning with an emphasis on human-computer interfaces, including a novel multi-user brain-computer interface, a classification problem pertaining to automated sleep-staging based on electroencephalographical (EEG) data, and an example of generative modeling using the MNIST dataset. Finally, we'll discuss a future direction for application of this framework to preference-based deep reinforcement learning.