ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

From strings to things: Knowledge-enabled VQA model that can read and reason

Singh, AK and Mishra, A and Shekhar, S and Chakraborty, A (2019) From strings to things: Knowledge-enabled VQA model that can read and reason. In: 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019, 27 Oct-2 Nov, 2019, South Korea, pp. 4601-4611.

[img] PDF
pro_iee_int_con_com_vis_4601-4611_2019.pdf - Published Version
Restricted to Registered users only

Download (3MB) | Request a copy
Official URL: https://dx.doi.org/10.1109/ICCV.2019.00470

Abstract

Text present in images are not merely strings, they provide useful cues about the image. Despite their utility in better image understanding, scene texts are not used in traditional visual question answering (VQA) models. In this work, we present a VQA model which can read scene texts and perform reasoning on a knowledge graph to arrive at an accurate answer. Our proposed model has three mutually interacting modules: I. proposal module to get word and visual content proposals from the image, ii. fusion module to fuse these proposals, question and knowledge base to mine relevant facts, and represent these facts as multi-relational graph, iii. reasoning module to perform a novel gated graph neural network based reasoning on this graph. The performance of our knowledge-enabled VQA model is evaluated on our newly introduced dataset, viz. text-KVQA. To the best of our knowledge, this is the first dataset which identifies the need for bridging text recognition with knowledge graph based reasoning. Through extensive experiments, we show that our proposed method outperforms traditional VQA as well as question-answering over knowledge base-based methods on text-KVQA.

Item Type: Conference Paper
Publication: Proceedings of the IEEE International Conference on Computer Vision
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: cited By 0; Conference of 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 ; Conference Date: 27 October 2019 Through 2 November 2019; Conference Code:158036
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 18 Aug 2020 05:17
Last Modified: 18 Aug 2020 05:17
URI: http://eprints.iisc.ac.in/id/eprint/65007

Actions (login required)

View Item View Item