Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Reproducing kernel space embeddings and metrics on probability measures

Abstract

The notion of Hilbert space embedding of probability measures has recently been used in various statistical applications like dimensionality reduction, homogeneity testing, independence testing, etc. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings : we denote this as [gamma]k, indexed by the positive definite (pd) kernel function k that defines the inner product in the RKHS. In this dissertation, various theoretical properties of [gamma]k and the associated RKHS embedding are presented. First, in order for [gamma]k to be useful in practice, it is essential that it is a metric and not just a pseudometric. Therefore, various easily checkable characterizations have been obtained for k so that [gamma]k is a metric (such k are referred to as characteristic kernels), in contrast to the previously published characterizations which are either difficult to check or may apply only in restricted circumstances (e.g., on compact domains). Second, the relation of characteristic kernels to the richness of RKHS--how well an RKHS approximates some target function space--and other common notions of pd kernels like strictly pd (spd), integrally spd, conditionally spd, etc., is studied. Third, the question of the nature of topology induced by [gamma]k is studied wherein it is shown that [gamma]k associated with integrally spd kernels--a stronger notion than a characteristic kernel--metrize the weak* (weak-star) topology on the space of probability measures. Fourth, [gamma]k is compared to integral probability metrics (IPMs) and [phi]-divergences, wherein it is shown that the empirical estimator of [gamma]k is simple to compute and exhibits fast rate of convergence compared to those of IPMs and [phi]-divergences. These properties make [gamma]k to be more applicable in practice than these other families of distances. Finally, a novel notion of embedding probability measures into a reproducing kernel Banach space (RKBS) is proposed and its properties are studied. It is shown that the proposed embedding and its properties generalize their RKHS counterparts, thereby resulting in richer distance measures on the space of probabilities

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View