Far-Field Speech Recognition Using Multivariate Autoregressive Models

Ganapathy, Sriram and Harish, Madhumita (2018) Far-Field Speech Recognition Using Multivariate Autoregressive Models. In: 19th Annual Conference of the International Speech Communication, 2-6, September 2018, Hyderabad, pp. 3023-3027.

PDF
int_sep_3023-3027_2018.pdf - Published Version
Restricted to Registered users only
Download (455kB) | Request a copy

Official URL: https://dx.doi.org/10.21437/Interspeech.2018-2003

Abstract

Automatic speech recognition (ASR) in far-field reverberant environments is challenging even with the state-of-the-art recognition systems. The main issues are artifacts in the signal due to the long-term reverberation that results in temporal smearing. The autoregressive (AR) modeling approach to speech feature extraction involves representing the high energy regions of the signal which are less susceptible to noise. In this paper, we propose a novel method of speech feature extraction using multivariate AR modeling (MAR) of temporal envelopes. The sub-band discrete cosine transform (DCT) coefficients obtained from multiple speech bands are used in a multivariate linear prediction setting to derive features for speech recognition. For single channel far-field speech recognition, the features are derived using multi-band linear prediction. In the case of multi-channel far-field speech recognition, we use the multi-channel data in the MAR framework. We perform several speech recognition experiments in the REVERB Challenge database for single and multi-microphone settings. In these experiments, the proposed feature extraction method provides significant improvements over baseline methods (average relative improvements of 9.7 % and 3.9 % in single microphone conditions for clean and multi conditions respectively and 6.3 % in multi-microphone conditions). The results with clean training on single microphone conditions further illustrates the effectiveness of the MAR features.

Item Type:	Conference Paper
Series.:	Interspeech
Publisher:	International Speech Communication Association
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	13 Aug 2020 08:14
Last Modified:	13 Aug 2020 08:14
URI:	http://eprints.iisc.ac.in/id/eprint/62930

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India