Theory/O.R. Schedules-Aladdin Center

Optimal Time Bounds for Approximate Clustering
Ramgopal Mettu

Abstract:

In this talk, I will present a fast randomized constant-factor approximation algorithm for the uncapacitated k-median problem. More specifically, our algorithm achieves an optimal running time of O(nk) for a wide range of k, i.e., between \log{n} and n/\log^2{n}. Our algorithm is based on a technique that we call "successive sampling" that allows us to rapidly identify a small (just O(k/log{n/k})) set of points that capture the essence of the input. We show that it is possible to rapidly construct a small problem instance from the sampled points and apply an existing k-median algorithm to obtain a set of k points whose cost is within a constant factor of the optimal solution for the original problem instance. Finally, I will discuss an application of our algorithm to the problem of "k-means clustering," and present some preliminary experimental results that indicate that our algorithm may be useful in practice.