We consider approximating distributions within the framework of optimal mass
transport and specialize to the problem of clustering data sets. Distances
between distributions are measured in the Wasserstein metric. The main problem
we consider is that of approximating sample distributions by ones with sparse
support. This provides a new viewpoint to clustering. We propose different
relaxations of a cardinality function which penalizes the size of the support
set. We establish that a certain relaxation provides the tightest convex lower
approximation to the cardinality penalty. We compare the performance of
alternative relaxations on a numerical study on clustering.