ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Noise robust speech rate estimation using signal-to-noise ratio dependent sub-band selection and peak detection strategy

Yarra, Chiranjeevi and Nagesh, Supriya and Deshmukh, Om D and Ghosh, Prasanta Kumar (2019) Noise robust speech rate estimation using signal-to-noise ratio dependent sub-band selection and peak detection strategy. In: JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 146 (3). pp. 1615-1628.

[img] PDF
J_Aco_Soc_Ame_146-3_1615.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://dx.doi.org/10.1121/1.5124473

Abstract

Speech (syllable) rate estimation typically involves computing a feature contour based on sub-band energies having strong local maxima/peaks at syllable nuclei, which are detected with the help of voicing decisions (VDs). While such a two-stage scheme works well in clean conditions, the estimated speech rate becomes less accurate in noisy condition particularly due to erroneous VDs and non-informative sub-bands mainly at low signal-to-noise ratios (SNR). This work proposes a technique to use VDs in the peak detection strategy in an SNR dependent manner. It also proposes a data-driven sub-band pruning technique to improve syllabic peaks of the feature contour in the presence of noise. Further, this paper generalizes both the peak detection and the sub-band pruning technique for unknown noise and/or unknown SNR conditions. Experiments are performed in clean and 20, 10, and 0 dB SNR conditions separately using Switchboard, TIMIT, and CTIMIT corpora under five additive noises: white, car, high-frequency-channel, cockpit, and babble. Experiments are also carried out in test conditions at unseen SNRs of -5 and 5 dB with four unseen additive noises: factory, sub-way, street, and exhibition. The proposed method outperforms the best of the existing techniques in clean and noisy conditions for three corpora.

Item Type: Journal Article
Publication: JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
Publisher: ACOUSTICAL SOC AMER AMER INST PHYSICS
Additional Information: Copyright of this article belong to ACOUSTICAL SOC AMER AMER INST PHYSICS
Keywords: 2ND-LANGUAGE LEARNERS FLUENCY; SYLLABLE NUCLEI; QUANTITATIVE ASSESSMENT; LOW-COMPLEXITY; SPEAKING RATE; RECOGNITION; TRACKING; DATABASE; READ
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 04 Dec 2019 10:43
Last Modified: 04 Dec 2019 10:43
URI: http://eprints.iisc.ac.in/id/eprint/63863

Actions (login required)

View Item View Item