Optimal Time Bounds for Approximate
Clustering

Ramgopal Mettu

Abstract:

In this talk, I will present a fast randomized constant-factor
approximation algorithm for the uncapacitated k-median problem.
More specifically, our algorithm achieves an optimal running time
of O(nk) for a wide range of k, i.e., between \log{n} and n/\log^2{n}.
Our algorithm is based on a technique that we call "successive
sampling" that allows us to rapidly identify a small (just
O(k/log{n/k})) set of points that capture the essence of the input.
We show that it is possible to rapidly construct a small problem
instance from the sampled points and apply an existing k-median
algorithm to obtain a set of k points whose cost is within a constant
factor of the optimal solution for the original problem instance.
Finally, I will discuss an application of our algorithm to the
problem of "k-means clustering," and present some preliminary
experimental results that indicate that our algorithm may be useful
in practice.