Computational Complexity - Research Seminar
Bodo Manthey: Improved Smoothed Analysis of k-Means Clustering II
July 8, 2008, 10:15 a.m.
E 1 3, Room 415
The k-means algorithm is one of the most popular clustering algorithms. While this algorithm is known for its speed in practice, its worst-case performance is poor: Its running-time can be exponential in the number of points, and the worst-case upper bound of nO(kd) is way off the observed running-time. Arthur and Vassilvitskii (2006) aimed at closing the gap between theoretical and practical performance, and they proved that the smoothed running-time of k-means is polynomial in nk and 1/σ, where σ is the standard
deviation of the Gaussian perturbation.
We improve on their result and show that the smoothed running-time is polynomial in n√k and 1/σ. Furthermore, we show that the smoothed running-time is polynomial in n and 1/σ if the number k of clusters and the dimension d of the data points are small compared to the number of points. Finally, we prove a polynomial bound for the smoothed running-time of k-means in one dimension.