ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

ASR inspired syllable stress detection for pronunciation evaluation without using a supervised classifier and syllable level features

Ramanathi, MK and Yarra, C and Ghosh, PK (2019) ASR inspired syllable stress detection for pronunciation evaluation without using a supervised classifier and syllable level features. In: 0th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, 15 September 2019 - 19 September 2019, Graz, pp. 924-928.

[img] PDF
pro_ann_con_2019_924-928_2019.pdf - Published Version
Restricted to Registered users only

Download (385kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2019-2091


Automatic syllable stress detection is typically performed with a supervised classifier considering manually annotated stress markings and features computed within the syllable segments derived from phoneme transcriptions and their time-aligned boundaries. However, the manual annotation is tedious and the errors in estimating segmental information could degrade stress detection accuracy. In order to circumvent these, we propose to estimate stress markings in automatic speech recognition (ASR) framework involving finite-state-transducer (FST) without using annotated stress markings and segmental information. For this, we train the ASR system with native English data along with pronunciation lexicon containing canonical stress markings and decode non-native utterances as pronunciations embedded with stress markings. In the decoding, we use an FST encoded with the pronunciations derived using phoneme transcriptions and the instructions involved in a typical manual annotation. Experiments are conducted on polysyllabic words taken from ISLE corpus containing utterances spoken by Italian and German speakers and using the ASR models trained with three corpora. Among all the three models, the highest stress detection accuracies with the proposed approach respectively on Italian & German speakers are found to be 2.07 & 1.19 higher than and comparable to those with the two supervised classification approaches used as baselines.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: ASR inspired modeling; Computer assisted language learning; Syllable stress detection; Unsupervised approach
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 01 Dec 2022 05:58
Last Modified: 01 Dec 2022 05:58
URI: https://eprints.iisc.ac.in/id/eprint/78123

Actions (login required)

View Item View Item