ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A ROBUST SPEECH RATE ESTIMATION BASED ON THE ACTIVATION PROFILE FROM THE SELECTED ACOUSTIC UNIT DICTIONARY

Nagesh, Supriya and Yarra, Chiranjeevi and Deshmukh, Om D and Ghosh, Prasanta Kumar (2016) A ROBUST SPEECH RATE ESTIMATION BASED ON THE ACTIVATION PROFILE FROM THE SELECTED ACOUSTIC UNIT DICTIONARY. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, MAR 20-25, 2016, Shanghai, PEOPLES R CHINA, pp. 5400-5404.

[img] PDF
IEEE_Int_Con_Aco_Spe_Sig_Pro_Pro_5400_2016.pdf - Published Version
Restricted to Registered users only

Download (425kB) | Request a copy
Official URL: http://dx.doi.org/ 10.1109/ICASSP.2016.7472709

Abstract

A typical solution for the speech rate estimation consists of two stages, which involves first computing a short-time feature contour such that most of peaks of the contour correspond to the syllable nuclei followed by the detection of the peaks of the contour corresponding to the syllable nuclei. Temporal correlation selected sub-band correlation (TCSSBC) is often used as a feature contour for the speech rate estimation in which correlation within and across a few selected sub-band energies are computed. In this work, instead of a fixed set of sub-bands, we learn them in a data-driven manner using a dictionary learning approach. Similarly, instead of the energy contours, we use the activation profile from the learned dictionary elements. We found that the peaks detected from the data-driven approach significantly improve the speech rate estimation when combined with the traditional TCSSBC approach using a proposed peak-merging strategy. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora. Except Switchboard, the correlation coefficient for the speech rate estimation using the proposed approach is found to be higher than those by the TCSSBC technique - 3.1% and 5.2% (relative) improvements for TIMIT and CTIMIT respectively.

Item Type: Conference Proceedings
Series.: International Conference on Acoustics Speech and Signal Processing ICASSP
Additional Information: Copy right for this article belongs to the IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 20 Jan 2017 04:28
Last Modified: 20 Jan 2017 04:28
URI: http://eprints.iisc.ac.in/id/eprint/55937

Actions (login required)

View Item View Item