ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities

Sudhakara, S and Ramanathi, MK and Yarra, C and Ghosh, PK (2019) An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities. In: 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, 15 - 19 September 2019, Graz, pp. 954-958.

[img] PDF
INTERSPEECH_2019.pdf - Published Version
Restricted to Registered users only

Download (187kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2019-2363

Abstract

Goodness of pronunciation (GoP) is typically formulated with Gaussian mixture model-hidden Markov model (GMM-HMM) based acoustic models considering HMM state transition probabilities (STPs) and GMM likelihoods of context dependent phonemes. On the other hand, deep neural network (DNN)HMM based acoustic models employed sub-phonemic (senone) posteriors instead of GMM likelihoods along with STPs. However, each senone is shared across many states; thus, there is no one-to-one correspondence between them. In order to circumvent this, most of the existing works have proposed modifications to the GoP formulation considering only posteriors neglecting the STPs. In this work, we derive a formulation for the GoP and it results in the formulation involving both senone posteriors and STPs. Further, we illustrate the steps to implement the proposed GoP formulation in Kaldi, a state-of-the-art automatic speech recognition toolkit. Experiments are conducted on English data collected from Indian speakers using acoustic models trained with native English data from LibriSpeech and Fisher-English corpora. The highest improvement in the correlation coefficient between the scores from the formulations and the expert ratings is found to be 14.89 (relative) better with the proposed approach compared to the best of the existing formulations that don't include STPs.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: Computer-aided pronunciation training; DNN-HMM acoustic model; Goodness of pronunciation; Pronunciation evaluation
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 06 Dec 2022 05:44
Last Modified: 06 Dec 2022 05:44
URI: https://eprints.iisc.ac.in/id/eprint/78257

Actions (login required)

View Item View Item