Skip to main content
eScholarship
Open Access Publications from the University of California

Some Computational Issues in Cluster Analysis with No A Priori Metric

  • Author(s): Coleman, Dan
  • Dong, Xiaopeng
  • Hardin, Johanna
  • Rocke, David
  • Woodruff, David
  • et al.
Abstract

Recent interest in data mining and knowledge discovery underscores the need for methods by which patterns can be discovered in data without any proir knowledge of their existence. In this paper, we explore computational methods of finding clusters of multivariate data points when there is no metric given a poriori. We are given an sample, X, of n points in Rp that come from g distinct multivariate normal populations with unknown parameters each of which contributes in excess of p points. Based on the assumption that we are given the number of groups, g, and a computational budget of T seconds of computer time, the paper reviews choices for cluster finding that have been described in the literature and introduces a new method that is a structured combination of two of them. We investigate these algorithms on some real data sets and describe simulation experiments. A principal conculion is strong support for the contention that a two-stage algorithm based on a combinatorial search followed by the EM algorithm is the best way to find clusters.

Main Content
Current View