KVQA: Knowledge-aware visual question answering

Shah, S and Mishra, A and Yadati, N and Talukdar, PP (2019) KVQA: Knowledge-aware visual question answering. In: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 27 January 2019through 1 February 2019, Honolulu, pp. 8876-8884.

PDF
EAAI_2019.pdf - Published Version
Restricted to Registered users only
Download (876kB) | Request a copy

Official URL: https://doi.org/10.1609/aaai.v33i01.33018876

Abstract

Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has not been addressed in prior research. We address this gap in this paper, and introduce KVQA - the first dataset for the task of (world) knowledge-aware VQA. KVQA consists of 183K question-answer pairs involving more than 18K named entities and 24K images. Questions in this dataset require multi-entity, multi-relation, and multi-hop reasoning over large Knowledge Graphs (KG) to arrive at an answer. To the best of our knowledge, KVQA is the largest dataset for exploring VQA over KG. Further, we also provide baseline performances using state-of-the-art methods on KVQA.

Item Type:	Conference Paper
Publication:	33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
Publisher:	AAAI Press
Additional Information:	The copyright for this article belongs to AAAI Press.
Keywords:	Knowledge representation; Large dataset; Visual languages, Base-line performance; Commonsense knowledge; Knowledge graphs; NAtural language processing; Number of peoples; Question Answering; Question-answer pairs; State-of-the-art methods, Natural language processing systems
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	06 Dec 2022 05:37
Last Modified:	06 Dec 2022 05:37
URI:	https://eprints.iisc.ac.in/id/eprint/78246

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India