3-D acoustic modeling for far-field multi-channel speech recognition

Purushothaman, A and Sreeram, A and Ganapathy, S (2020) 3-D acoustic modeling for far-field multi-channel speech recognition. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 4-8 May 2020, Barcelona; Spain, pp. 6964-6968.

PDF
ICASSP-IEEE-6964-6968.pdf - Published Version
Restricted to Registered users only
Download (767kB) | Request a copy

Official URL: https://dx.doi.org/10.1109/ICASSP40776.2020.905448...

Abstract

The conventional approach to automatic speech recognition in multichannel reverberant conditions involves a beamforming based enhancement of the multi-channel speech signal followed by a single channel neural acoustic model. In this paper, we propose to model the multi-channel signal directly using a convolutional neural network (CNN) based architecture which performs the joint acoustic modeling on the three dimensions of time, frequency and channel. The features that are input to the 3-D CNN are extracted by modeling the signal peaks in the spatio-spectral domain using a multivariate autoregressive modeling approach. This AR model is efficient in capturing the channel correlations in the frequency domain of the multi-channel signal. The experiments are conducted on the CHiME-3 and REVERB Challenge dataset using multi-channel reverberant speech. In these experiments, the proposed 3-D feature and acoustic modeling approach provides significant improvements over an ASR system trained with beamformed audio (average relative improvements of 16 and 6 in word error rates for CHiME-3 and REVERB Challenge datasets respectively). Â© 2020 IEEE

Item Type:	Conference Paper
Publication:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	Copyright for this article belongs to the IEEE.
Keywords:	Acoustic fields; Audio acoustics; Audio signal processing; Convolutional neural networks; Frequency domain analysis; Reverberation; Speech; Speech communication, Automatic speech recognition; Channel correlation; Conventional approach; Frequency domains; Multivariate autoregressive models; Reverberant condition; Spectral domains; Three dimensions, Speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	14 Mar 2021 06:44
Last Modified:	14 Mar 2021 06:44
URI:	http://eprints.iisc.ac.in/id/eprint/66772

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India