Tripathi, A and Dani, RR and Mishra, A and Chakraborty, A (2023) Multimodal query-guided object localization. In: Multimedia Tools and Applications .
|
PDF
mul_too_app_2023.pdf - Published Version Download (1MB) | Preview |
Abstract
Recent studies have demonstrated the effectiveness of using hand-drawn sketches of objects as queries for one-shot object localization. However, hand-drawn crude sketches alone can be ambiguous for object localization, which could result in misidentification, e.g., a sketch of a laptop could be confused for a sofa. To overcome this, we propose a novel multimodal approach to object localization that combines sketch queries with linguistic category definitions, allowing for a better representation of visual and semantic cues. Our approach employs a cross-modal attention scheme that guides the region proposal network to obtain relevant proposals. Further, we propose an orthogonal projection-based proposal scoring technique that effectively ranks proposals with respect to the query. We evaluated our method using hand-drawn sketches from the ‘Quick, Draw!’ dataset and glosses from ‘WordNet’ as queries on the widely-used MS-COCO dataset, and achieve superior performance compared to related baselines in both open- and closed-set settings.
Item Type: | Journal Article |
---|---|
Publication: | Multimedia Tools and Applications |
Publisher: | Springer |
Additional Information: | The copyright for this article belongs to the Author. |
Keywords: | Cross-modal attention; Cross-modal localization; Gloss; Open-set object localization; Sketch. |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 25 Jul 2023 06:21 |
Last Modified: | 25 Jul 2023 06:21 |
URI: | https://eprints.iisc.ac.in/id/eprint/82606 |
Actions (login required)
View Item |