ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Multimodal query-guided object localization

Tripathi, A and Dani, RR and Mishra, A and Chakraborty, A (2023) Multimodal query-guided object localization. In: Multimedia Tools and Applications .

[img]
Preview
PDF
mul_too_app_2023.pdf - Published Version

Download (1MB) | Preview
Official URL: https://doi.org/10.1007/s11042-023-15779-y

Abstract

Recent studies have demonstrated the effectiveness of using hand-drawn sketches of objects as queries for one-shot object localization. However, hand-drawn crude sketches alone can be ambiguous for object localization, which could result in misidentification, e.g., a sketch of a laptop could be confused for a sofa. To overcome this, we propose a novel multimodal approach to object localization that combines sketch queries with linguistic category definitions, allowing for a better representation of visual and semantic cues. Our approach employs a cross-modal attention scheme that guides the region proposal network to obtain relevant proposals. Further, we propose an orthogonal projection-based proposal scoring technique that effectively ranks proposals with respect to the query. We evaluated our method using hand-drawn sketches from the ‘Quick, Draw!’ dataset and glosses from ‘WordNet’ as queries on the widely-used MS-COCO dataset, and achieve superior performance compared to related baselines in both open- and closed-set settings.

Item Type: Journal Article
Publication: Multimedia Tools and Applications
Publisher: Springer
Additional Information: The copyright for this article belongs to the Author.
Keywords: Cross-modal attention; Cross-modal localization; Gloss; Open-set object localization; Sketch.
Department/Centre: Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 25 Jul 2023 06:21
Last Modified: 25 Jul 2023 06:21
URI: https://eprints.iisc.ac.in/id/eprint/82606

Actions (login required)

View Item View Item