ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Subband selection for binaural speech source localization

Karthik, GR and Ghosh, PK (2017) Subband selection for binaural speech source localization. In: 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 - 24 August 2017, Stockholm, pp. 1929-1933.

[img] PDF
Pro_Ann-Con_Int-Spe_Com-Ass_2017-A_1929 - 1933_2017.pdf - Published Version
Restricted to Registered users only

Download (598kB) | Request a copy
Official URL: https://10.21437/Interspeech.2017-954


We consider the task of speech source localization using binaural cues, namely interaural time and level difference (ITD & ILD). A typical approach is to process binaural speech using gammatone filters and calculate frame-level ITD and ILD in each subband. The ITD, ILD and their combination (ITLD) in each subband are statistically modelled using Gaussian mixture models for every direction during training. Given a binaural test-speech, the source is localized using maximum likelihood criterion assuming that the binaural cues in each subband are independent. We, in this work, investigate the robustness of each subband for localization and compare their performance against the full-band scheme with 32 gammatone filters. We propose a subband selection procedure using the training data where subbands are rank ordered based on their localization performance. Experiments on Subject 003 from the CIPIC database reveal that, for high SNRs, the ITD and ITLD of just one subband centered at 296Hz is sufficient to yield localization accuracy identical to that of the full-band scheme with a test-speech of duration 1sec. At low SNRs, in case of ITD, the selected subbands are found to perform better than the full-band scheme.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association
Keywords: Maximum likelihood; Speech, Gammatone filters; Gaussian Mixture Model; Interaural level differences; Interaural time differences; Localization accuracy; Localization performance; Maximum likelihood criteria; Source localization, Speech communication
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 25 Jul 2022 05:24
Last Modified: 25 Jul 2022 05:24
URI: https://eprints.iisc.ac.in/id/eprint/74716

Actions (login required)

View Item View Item