ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Goodhart�s Law Applies to NLP�s Explanation Benchmarks

Hsia, J and Pruthi, D and Singh, A and Lipton, ZC (2024) Goodhart�s Law Applies to NLP�s Explanation Benchmarks. In: 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024, 17 March 2024through 22 March 2024, St. Julian's, pp. 1322-1335.

[img] PDF
eac_2024_18_con_eur_cha_ass_com_lin_hin_eac_2024.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Despite the rising popularity of saliency-based explanations, the research community remains at an impasse, facing doubts concerning their purpose, efficacy, and tendency to contradict each other. Seeking to unite the community�s efforts around common goals, several recent works have proposed evaluation metrics. In this paper, we critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics, focusing our inquiry on natural language processing. First, we show that we can inflate a model�s comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs. Our strategy exploits the tendency for extracted explanations and their complements to be �out-of-support� relative to each other and in-distribution inputs. Next, we demonstrate that the EVAL-X metrics can be inflated arbitrarily by a simple method that encodes the label, even though EVAL-X is precisely motivated to address such exploits. Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture. © 2024 Association for Computational Linguistics.

Item Type: Conference Paper
Publication: EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024
Publisher: Association for Computational Linguistics (ACL)
Additional Information: The copyright for this article belongs to Association for Computational Linguistics (ACL).
Keywords: Computational linguistics, 'current; Evaluation metrics; Language processing; Natural languages; Research communities; SIMPLE method; Test inputs, Natural language processing systems
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 30 Aug 2024 12:05
Last Modified: 30 Aug 2024 12:05
URI: http://eprints.iisc.ac.in/id/eprint/84924

Actions (login required)

View Item View Item