ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

HaVAT: Automatic Cluster Structure Assessment in Unlabeled data

Pagadala, KM and Rathore, P (2024) HaVAT: Automatic Cluster Structure Assessment in Unlabeled data. In: UNSPECIFIED, pp. 45-53.

[img] PDF
Acm_int_con_pro_ser_2024 - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: https://doi.org/10.1145/3632410.3632447

Abstract

Clustering algorithms often rely on the input of the desired number of clusters, denoted as "k,"to partition the data. However, a crucial question arises: do the data truly exhibit clusters, and if so, how many? This is known as clustering tendency assessment. Many variants of VAT have been proposed recently, and this family of algorithms called Visual Assessment of Clustering Tendency (VAT) has gained popularity among researchers in various domains. These algorithms aim to visually estimate the cluster structure and the number of clusters in the input dataset by reordering the pairwise dissimilarity matrix and creating a grayscale image called the reordered dissimilarity matrix image (RDI). Dark blocks in the RDI, representing pixels with low dissimilarity values, visually indicate potential clusters within the data. Although the VAT family of algorithms has proven valuable for estimating clusters in diverse datasets, manually interpreting the output, particularly with overlapping clusters or complex geometries, can be challenging. In this paper, we propose HaVAT, a novel method based on the Hough transform, to automate the interpretation of the RDI. Additionally, HaVAT can automatically determine the cluster hierarchy and obtain the optimal partition as part of the automated assessment based on a novel scoring mechanism. Our experiments on various datasets demonstrate HaVAT's effectiveness and superiority over state-of-the-art methods in estimating cluster structure in terms of hierarchy and number of clusters. © 2024 ACM.

Item Type: Conference Paper
Publication: ACM International Conference Proceeding Series
Publisher: Association for Computing Machinery
Additional Information: The copyright for this article belongs to Association for Computing Machinery.
Keywords: Clustering algorithms; Data visualization; Hough transforms; Matrix algebra; Pattern recognition; Unsupervised learning, Cluster geometries; Cluster structure; Cluster structure assessment; Clusterings; Dissimilarity matrix; Gray-scale images; Number of clusters; Overlapping clusters; Unlabeled data; Visual assessments, Cluster analysis
Department/Centre: Division of Interdisciplinary Sciences > Management Studies
Date Deposited: 04 Mar 2024 07:03
Last Modified: 04 Mar 2024 07:03
URI: https://eprints.iisc.ac.in/id/eprint/84176

Actions (login required)

View Item View Item