Explainable Offensive Language Classifier

Kohli, A and Devi, VS (2024) Explainable Offensive Language Classifier. In: Communications in Computer and Information Science, 1962 C (1). pp. 299-313.

Preview

PDF
acs_ome_9_1_2024.pdf - Published Version
Download (9MB) | Preview

Official URL: https://doi.org/10.1007/978-981-99-8132-8_23

Abstract

Offensive content in social media has become a serious issue, due to which its automatic detection is a crucial task. Deep learning approaches for Natural Language Processing (or NLP) have proven to be on or even above human-level accuracy for offensive language detection tasks. Due to this, the deployment of deep learning models for these tasks is justified. However, there is one key aspect that these models lack, which is explainability, in contrast to humans. In this paper, we provide an explainable model for offensive language detection in the case of multi-task learning. Our model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. We also provide a detailed analysis of the model interpretability. Â© 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Item Type:	Journal Article
Publication:	Communications in Computer and Information Science
Publisher:	Springer Science and Business Media Deutschland GmbH
Additional Information:	The copyright for this article belongs to Author
Keywords:	Deep learning; Learning algorithms; Learning systems, Automatic Detection; Deep learning; Explainability and interpretability; Interpretability; Language detection; Language processing; Learning approach; Natural languages; Offensive languages; Social media, Natural language processing systems
Department/Centre:	Division of Biological Sciences > Biochemistry
Date Deposited:	28 Feb 2024 13:28
Last Modified:	28 Feb 2024 13:28
URI:	https://eprints.iisc.ac.in/id/eprint/83711

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India