Nazreen, PM and Ramakrishnan, AG and Ghosh, PK (2018) A Joint Enhancement-Decoding Formulation for Noise Robust Phoneme Recognition. In: 14th IEEE India Council International Conference, INDICON 2017, 15 - 17 December 2017, Roorkee.
PDF
IEEE_INDICON_2017.pdf - Published Version Restricted to Registered users only Download (147kB) | Request a copy |
Abstract
We consider a dictionary based speech enhancement in the context of automatic recognition of noisy speech. Speech in each analysis frame is denoised as a front-end processing using a class-specific (e.g. phoneme) dictionary selected based on the estimated class label. However, when the estimated label is erroneous, a wrong class model is chosen for many frames. We propose a Joint Enhancement-Decoding (JED) algorithm to overcome this issue by jointly optimizing for labels of all the frames and the decoding path. The algorithm optimizes over multiple enhanced versions of each frame using different phoneme specific dictionaries and gives the maximum likelihood path of state sequences as well as the best (in the maximum likelihood sense) choice of the enhanced observation sequence as its output. The number of phoneme-specific dictionaries used for enhancement in an analysis frame is varied from 1 to 5 based on the phoneme confusion matrix and the recognition results are reported for each case. Experiments with TIMIT corpus and five different noises at 0, 5 and 10 dB SNRs show that the recognition performance varies with the number of dictionaries, and in most of the cases, is the best when two or three dictionaries are employed.
Item Type: | Conference Paper |
---|---|
Publication: | 2017 14th IEEE India Council International Conference, INDICON 2017 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to the IEEE. |
Keywords: | Decoding; Maximum likelihood; Speech enhancement, Automatic recognition; Dictionary learning; Dictionary-based; Front-end processing; Phoneme confusion matrix; Robust speech recognition; Sparse coding; State sequences, Speech recognition |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 03 Aug 2022 06:39 |
Last Modified: | 03 Aug 2022 06:39 |
URI: | https://eprints.iisc.ac.in/id/eprint/75210 |
Actions (login required)
View Item |