A Joint Enhancement-Decoding Formulation for Noise Robust Phoneme Recognition

Nazreen, PM and Ramakrishnan, AG and Ghosh, PK (2018) A Joint Enhancement-Decoding Formulation for Noise Robust Phoneme Recognition. In: 14th IEEE India Council International Conference, INDICON 2017, 15 - 17 December 2017, Roorkee.

PDF
IEEE_INDICON_2017.pdf - Published Version
Restricted to Registered users only
Download (147kB) | Request a copy

Official URL: https://doi.org/10.1109/INDICON.2017.8487714

Abstract

We consider a dictionary based speech enhancement in the context of automatic recognition of noisy speech. Speech in each analysis frame is denoised as a front-end processing using a class-specific (e.g. phoneme) dictionary selected based on the estimated class label. However, when the estimated label is erroneous, a wrong class model is chosen for many frames. We propose a Joint Enhancement-Decoding (JED) algorithm to overcome this issue by jointly optimizing for labels of all the frames and the decoding path. The algorithm optimizes over multiple enhanced versions of each frame using different phoneme specific dictionaries and gives the maximum likelihood path of state sequences as well as the best (in the maximum likelihood sense) choice of the enhanced observation sequence as its output. The number of phoneme-specific dictionaries used for enhancement in an analysis frame is varied from 1 to 5 based on the phoneme confusion matrix and the recognition results are reported for each case. Experiments with TIMIT corpus and five different noises at 0, 5 and 10 dB SNRs show that the recognition performance varies with the number of dictionaries, and in most of the cases, is the best when two or three dictionaries are employed.

Item Type:	Conference Paper
Publication:	2017 14th IEEE India Council International Conference, INDICON 2017
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to the IEEE.
Keywords:	Decoding; Maximum likelihood; Speech enhancement, Automatic recognition; Dictionary learning; Dictionary-based; Front-end processing; Phoneme confusion matrix; Robust speech recognition; Sparse coding; State sequences, Speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	03 Aug 2022 06:39
Last Modified:	03 Aug 2022 06:39
URI:	https://eprints.iisc.ac.in/id/eprint/75210

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India