ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Audiovisual correspondence learning in humans and machines

Krishnamohan, V and Soman, A and Gupta, A and Ganapathy, S (2020) Audiovisual correspondence learning in humans and machines. In: 21st Annual Conference of the International Speech Communication Association, INTERSPEECH, 25 October 2020, Shanghai; China, pp. 4462-4466.

[img] PDF
INT-Vol-2020.pdf - Published Version
Restricted to Registered users only

Download (561kB) | Request a copy
Official URL: https://dx.doi.org/10.21437/Interspeech.2020-2674

Abstract

Audiovisual correspondence learning is the task of acquiring the association between images and its corresponding audio. In this paper, we propose a novel experimental paradigm in which unfamiliar pseudo images and pseudowords in audio form are introduced to both humans and machine systems. The task is to learn the association between the pairs of image and audio which is later evaluated with a retrieval task. The machine system used in the study is pretrained with the ImageNet corpus along with the corresponding audio labels. This model is transfer learned for the new image-audio pairs. Using the proposed paradigm, we perform a direct comparison of one-shot, two-shot and three-shot learning performance for humans and machine systems. The human behavioral experiment confirms that the majority of the correspondence learning happens in the first exposure of the audio-visual pair. This paper proposes a machine model which performs on par with the humans in audiovisual correspondence learning. But compared to the machine model, humans exhibited better generalization ability for new input samples with a single exposure. Copyright © 2020 ISCA

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: cited By 0; Conference of 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference Date: 25 October 2020 Through 29 October 2020; Conference Code:165507
Keywords: Audio systems; Audiovisual; Behavioral research; Speech communication; Turing machines, Behavioral experiment; Generalization ability; Input sample; Learning performance; Machine modeling; Machine systems; Pseudowords; Single exposure, Learning systems
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 12 Jan 2021 06:17
Last Modified: 12 Jan 2021 06:17
URI: http://eprints.iisc.ac.in/id/eprint/67644

Actions (login required)

View Item View Item