Metric Clustering
Claire Kenyon, Ecole Polytechnique
and Institut Universitaire de France
Dec 12, 2003, Wean 7220, 3:30pm
Abstract:
Given n data items in a general metric space, how can we efficiently
partition them into k clusters of "similar" items? There
are many models for this ubiquitous problem, which arises in application
areas such as data mining, image recognition, medical imaging,
web analysis, etc.
One measure of the quality of a k-clustering is the sum of all
pairwise distances inside the clusters, which must be minimized.
We discuss techniques and algorithms, first for the complementary
problem, which can be seen as a metric analog of Max-Cut in graphs,
then for
2-clustering, and finally sketch extensions to variants with other
objective
unctions or with cardinality constraints. The algorithms are based
on random sampling.