ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Multi-label classification from multiple noisy sources using topic models

Padmanabhan, D and Bhat, S and Shevade, S and Narahari, Y (2017) Multi-label classification from multiple noisy sources using topic models. In: Information (Switzerland), 8 (2).

mul_lab_cla_2017.pdf - Published Version

Download (793kB) | Preview
Official URL: https://doi.org/10.3390/info8020052


Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. Our first contribution assumes labels from a perfect source. Towards this, we propose a novel topic model (ML-PA-LDA). The distinguishing feature in our model is that classes that are present as well as the classes that are absent generate the latent topics and hence the words. Extensive experimentation on real world datasets reveals the superior performance of the proposed model. A natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In this more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this motivation, we further augment our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS. With experiments on simulated noisy annotators, the proposed model learns the qualities of the annotators well, even with minimal training data.

Item Type: Journal Article
Publication: Information (Switzerland)
Publisher: MDPI AG
Additional Information: The copyright for this article belongs to MDPI AG.
Keywords: Crowdsourcing; Learning systems; Supervised learning; Websites, Crowdsourcing platforms; Multi-label; Multiple source; Real-world datasets; Supervised machine learning; Topic model; User-generated content; Variational inference, Positive ions
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 20 Jul 2022 10:01
Last Modified: 20 Jul 2022 10:01
URI: https://eprints.iisc.ac.in/id/eprint/74917

Actions (login required)

View Item View Item