ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection

Yarra, Chiranjeevi and Deshmukh, Om D and Ghosh, Prasanta Kumar (2016) A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection. In: SPEECH COMMUNICATION, 78 . pp. 62-71.

[img]
Preview
PDF
Spe_Com_78_62_2016.pdf - Published Version

Download (1MB) | Preview
Official URL: http://dx.doi.org/10.1016/j.specom.2016.01.004

Abstract

Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.

Item Type: Journal Article
Publication: SPEECH COMMUNICATION
Publisher: ELSEVIER SCIENCE BV
Additional Information: Copy right for this article belongs to the ELSEVIER SCIENCE BV, PO BOX 211, 1000 AE AMSTERDAM, NETHERLANDS
Keywords: Speech rate estimation; Syllable nuclei detection; TCSSBC; Mode-shape classification; Mode-shape feature vectors; Peak detection
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 11 May 2016 06:55
Last Modified: 11 May 2016 06:55
URI: http://eprints.iisc.ac.in/id/eprint/53785

Actions (login required)

View Item View Item