Unsupervised neural mask estimator for generalized eigen-value beamforming based ASR

Kumar, R and Sreeram, A and Purushothaman, A and Ganapathy, S (2020) Unsupervised neural mask estimator for generalized eigen-value beamforming based ASR. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 4-8 May 2020, Barcelona; Spain, pp. 7494-7498.

PDF
ICASSP_IEEE_7494-7498.pdf - Published Version
Restricted to Registered users only
Download (369kB) | Request a copy

Official URL: https://dx.doi.org/10.1109/ICASSP40776.2020.905455...

Abstract

The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In this way, the model training can also be carried out on real recordings of noisy speech rather than simulated ones alone done in a typical teacher model. Several experiments performed on noisy and reverberant environments in the CHiME-3 corpus as well as the REVERB challenge corpus highlight the effectiveness of the proposed approach. The ASR results for the proposed approach provide performances that are significantly better than a teacher model trained on an out-of-domain dataset and on par with the oracle mask estimators trained on the in-domain dataset. Â© 2020 IEEE

Item Type:	Conference Paper
Publication:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	cited By 0; Conference of 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference Date: 4 May 2020 Through 8 May 2020; Conference Code:161907
Keywords:	Audio recordings; Beamforming; Personnel training; Speech communication, Acoustic beamforming; Linear prediction; Model training; Noisy recordings; Reverberant environment; Signal enhancement; State-of-art methods; Teacher models, Audio signal processing
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	02 Dec 2020 10:32
Last Modified:	02 Dec 2020 10:32
URI:	http://eprints.iisc.ac.in/id/eprint/66773

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India