ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Time-varying sinusoidal demodulation for non-stationary modeling of speech

Sharma, Neeraj Kumar and Sreenivas, Thippur V (2018) Time-varying sinusoidal demodulation for non-stationary modeling of speech. In: SPEECH COMMUNICATION, 105 . pp. 77-91.

[img] PDF
Spe_Com_105_77_2018.pdf - Published Version
Restricted to Registered users only

Download (5MB) | Request a copy
Official URL: https://doi.org/10.1016/j.specom.2018.10.008


Speech signals contain a fairly rich time-evolving spectral content. Accurate analysis of this time-evolving spectrum is an open challenge in signal processing. Towards this, we visit time-varying sinusoidal modeling of speech and propose an alternate model estimation approach. The estimation operates on the whole signal without any short-time analysis. The approach proceeds by extracting the fundamental frequency sinusoid (FFS) from speech signal. The instantaneous amplitude (IA) of the FFS is used for voiced/unvoiced stream segregation. The voiced stream is then demodulated using a variant of in-phase and quadrature-phase demodulation carried at harmonics of the FFS. The result is a non-parametric time-varying sinusoidal representation, specifically, an additive mixture of quasi-harmonic sinusoids for voiced stream and a wideband mono-component sinusoid for unvoiced stream. The representation is evaluated for analysis-synthesis, and the bandwidth of IA and IF signals are found to be crucial in preserving the quality. Also, the obtained IA and IF signals are found to be carriers of perceived speech attributes, such as speaker characteristics and intelligibility. On comparing the proposed modeling framework with the existing approaches, which operate on short-time segments, improvement is found in simplicity of implementation, objective-scores, and computation time. The listening test scores suggest that the quality preserves naturalness but does not yet beat the state-of-the-art short-time analysis methods. In summary, the proposed representation lends itself for high resolution temporal analysis of non-stationary speech signals, and also allows quality preserving modification and synthesis.

Item Type: Journal Article
Additional Information: Copyright of this article belongs to ELSEVIER SCIENCE BV
Keywords: Speech modeling; Sinusoidal modeling; Speech analysis; Speech synthesis; Harmonic demodulation; Subband modeling
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 06 Feb 2019 05:57
Last Modified: 06 Feb 2019 05:57
URI: http://eprints.iisc.ac.in/id/eprint/61597

Actions (login required)

View Item View Item