ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Clustering Large Graphs via the Singular Value Decomposition

Drineas, P and Frieze, A and Kannan, R and Vempala, S and Vinay, V (2004) Clustering Large Graphs via the Singular Value Decomposition. In: Machine Learning, 56 (1-3). pp. 9-33.

[img] PDF
24.pdf - Published Version
Restricted to Registered users only

Download (171kB) | Request a copy
Official URL: http://www.springerlink.com/content/u424k6nn6k6227...

Abstract

We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuousm relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally,we showthat the SVD of a random submatrix—chosen according to a suitable probability distribution—of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.

Item Type: Journal Article
Publication: Machine Learning
Publisher: Springer
Additional Information: copyright of this article belongs to Springer.
Keywords: Singular Value Decomposition;randomized algorithms;k-means clustering.
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 18 Dec 2008 09:50
Last Modified: 19 Sep 2010 04:53
URI: http://eprints.iisc.ac.in/id/eprint/16743

Actions (login required)

View Item View Item