ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Finding a latent k�simplex in O� (k · nnz(data)) time via Subset Smoothing

Bhattacharyya, C and Kannan, R (2020) Finding a latent k�simplex in O� (k · nnz(data)) time via Subset Smoothing. In: 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, 5-8 January 2020, Salt Lake City; United States, pp. 122-140.

[img] PDF
PRO_ANN_ACM_SIAM_SYM_DIS_ALG_2020_122-140_2020.pdf - Published Version
Restricted to Registered users only

Download (672kB) | Request a copy
Official URL: https://dx.doi.org/10.1137/1.9781611975994.8


In this paper we show that the learning problem for a large class of Latent variable models, such as Mixed Membership Stochastic Block Models, Topic Models, and Adversarial Clustering can be posed geometrically as follows: find a latent k� vertex simplex, K in Rd, given n data points, each obtained by perturbing a latent point in K. This problem does not seem to have been addressed. Our main contribution is an efficient algorithm for the geometric problem under deterministic assumptions which naturally hold for the models considered here. We observe that for a suitable r � n, K is close to a data-determined polytope K' (the subset smoothed polytope) which is the convex hull of the (nr ) points, each obtained by averaging an r subset of data points. Our algorithm is simply stated: it optimizes k carefully chosen linear functions over K' to find the k vertices of the latent simplex. The proof of correctness is more involved, drawing on existing and new tools from Numerical Analysis. Our overall runtime of O�(k nnz) is as good as the best times of existing algorithms (modulo O�(1) factor) for the special cases and is better for sparse data which is the norm in Topic Modelling and Mixed Membership models. Some consequences of our algorithm are: � Mixed Membership Models and Topic Models: We give the first quasi-input-sparsity time algorithm for parameter estimation for k � O�(1) � Adversarial Clustering: In k�means, an adversary is allowed to move many data points from each cluster towards the convex hull of other cluster centers. Our algorithm still estimates cluster centers well. Copyright © 2020 by SIAM

Item Type: Conference Paper
Publication: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
Publisher: Association for Computing Machinery
Additional Information: cited By 0; Conference of 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 ; Conference Date: 5 January 2020 Through 8 January 2020; Conference Code:159165
Keywords: Computational geometry; Stochastic models; Stochastic systems, Geometric problems; Latent variable models; Learning problem; Linear functions; Membership models; Proof of correctness; Stochastic block models; Time algorithms, Clustering algorithms
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 01 Jan 2021 08:15
Last Modified: 01 Jan 2021 08:15
URI: http://eprints.iisc.ac.in/id/eprint/65436

Actions (login required)

View Item View Item