Agrawal, P and Ganapathy, S (2017) Speech representation learning using unsupervised data-driven modulation filtering for robust ASR. In: 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 20 - 24 August 2017, Stockholm, pp. 2446-2450.
PDF
Pro_Ann-Con_Int-Spe_Com-Ass_2017_2446 - 2450_2017.pdf - Published Version Restricted to Registered users only Download (517kB) | Request a copy |
Abstract
The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these evidences, we propose a speech representation learning paradigm using data-driven 2- D spectro-temporal modulation filter learning. In particular, multiple representations are derived using the convolutional restricted Boltzmann machine (CRBM) model in an unsupervised manner from the input speech spectrogram. A filter selection criteria based on average number of active hidden units is also employed to select the representations for ASR. The experiments are performed on Wall Street Journal (WSJ) Aurora-4 database with clean and multi condition training setup. In these experiments, the ASR results obtained from the proposed modulation filtering approach shows significant robustness to noise and channel distortions compared to other feature extraction methods (average relative improvements of 19 over baseline features in clean training). Furthermore, the ASR experiments performed on reverberant speech data from the REVERB challenge corpus highlight the benefits of the proposed representation learning scheme for far field speech recognition.
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher: | International Speech Communication Association |
Additional Information: | The copyright for this article belongs to International Speech Communication Association |
Keywords: | Convolution; Modulation; Reverberation; Spectrographs; Speech; Speech communication; Unsupervised learning, Automatic speech recognition system; Feature extraction methods; Modulation filtering; Multi-condition trainings; Multiple representation; Restricted boltzmann machine; Reverberant environment; Spectro-temporal modulations, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 22 Jul 2022 10:48 |
Last Modified: | 22 Jul 2022 10:48 |
URI: | https://eprints.iisc.ac.in/id/eprint/74709 |
Actions (login required)
View Item |