ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Evaluating Large Language Models for Health-related Queries with Presuppositions

Kaur, N and Choudhury, M and Pruthi, D (2024) Evaluating Large Language Models for Health-related Queries with Presuppositions. In: Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024, 11 August 2024-16 August 2024, Bangkok, pp. 14308-14331.

[img] PDF
Pro_Ann_Mee_Com_Lin_2024.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: https://aclanthology.org/2024.findings-acl.850.pdf

Abstract

As corporations rush to integrate large language models (LLMs) to their search offerings, it is critical that they provide factually accurate information, that is robust to any presuppositions that a user may express. In this work, we introduce UPHILL, a dataset consisting of health-related queries with varying degrees of presuppositions. Using UPHILL, we evaluate the factual accuracy and consistency of InstructGPT, ChatGPT, GPT-4 and Bing Copilot models. We find that while model responses rarely contradict true health claims (posed as questions), all investigated models fail to challenge false claims. Alarmingly, responses from these models agree with 23-32 of the existing false claims, and 49-55 with novel fabricated claims. As we increase the extent of presupposition in input queries, responses from all models except Bing Copilot agree with the claim considerably more often, regardless of its veracity. Given the moderate factual accuracy, and the inability of models to challenge false assumptions, our work calls for a careful assessment of current LLMs for use in high-stakes scenarios. © 2024 Association for Computational Linguistics.

Item Type: Conference Paper
Publication: Proceedings of the Annual Meeting of the Association for Computational Linguistics
Publisher: Association for Computational Linguistics (ACL)
Additional Information: The copyright for this article belongs to the publisher.
Keywords: Computational linguistics; Query languages, 'current; Health claims; Language model; Model response; Query response, Structured Query Language
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 26 Oct 2024 08:28
Last Modified: 26 Oct 2024 08:28
URI: http://eprints.iisc.ac.in/id/eprint/86555

Actions (login required)

View Item View Item