Theory/O.R. Schedules-ALADDIN Center

Metric Clustering
Claire Kenyon, Ecole Polytechnique and Institut Universitaire de France
Dec 12, 2003, Wean 7220, 3:30pm

Abstract:

Given n data items in a general metric space, how can we efficiently partition them into k clusters of "similar" items? There are many models for this ubiquitous problem, which arises in application areas such as data mining, image recognition, medical imaging, web analysis, etc.

One measure of the quality of a k-clustering is the sum of all pairwise distances inside the clusters, which must be minimized. We discuss techniques and algorithms, first for the complementary problem, which can be seen as a metric analog of Max-Cut in graphs, then for
2-clustering, and finally sketch extensions to variants with other objective
unctions or with cardinality constraints. The algorithms are based on random sampling.