We investigate here the behavior of the standard k-means clustering
algorithm and several alternatives to it: the k-harmonic means algorithm due to
Zhang and colleagues, fuzzy k-means, Gaussian expectation-maximization, and two
new variants of k-harmonic means. Our aim is to find which aspects of these
algorithms contribute to finding good clusterings, as opposed to converging to
a low-quality local optimum. We describe each algorithm in a unified framework
that introduces separate cluster membership and data weight functions. We then
show that the algorithms do behave very differently from each other on simple
low-dimensional synthetic datasets, and that the k-harmonic means method is
superior. Having a soft membership function is essential for finding
high-quality clusterings, but having a non-constant data weight function is
useful also.
Pre-2018 CSE ID: CS2002-0702