ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

On the suitability of the Riesz spectro-temporal envelope for WavENet based speech synthesis

Dhiman, JK and Adiga, N and Seelamantula, CS (2019) On the suitability of the Riesz spectro-temporal envelope for WavENet based speech synthesis. In: 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, 15 - 19 September 2019, Graz, pp. 944-948.

[img] PDF
INTERSPEECH_2019.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2019-2626


We address the problem of estimating the time-varying spectral envelope of a speech signal using a spectro-temporal demodulation technique. Unlike the conventional spectrogram, we consider a pitch-adaptive spectrogram and model a spectro-temporal patch using an amplitude- and frequency-modulated two-dimensional (2-D) cosine signal. We employ a demodulation technique based on the Riesz transform that we proposed recently to estimate the amplitude and frequency modulations. The amplitude modulation (AM) corresponds to the vocal-tract filter magnitude response (or envelope) and the frequency modulation (FM) corresponds to the excitation. We consider the AM and demonstrate its effectiveness by incorporating it as an acoustic feature for local conditioning in the statistical WaveNet vocoder for the task of speech synthesis. The quality of the synthesized speech obtained with the Riesz envelope is compared with that obtained using the envelope estimated by the WORLD vocoder. Objective measures and subjective listening tests on the CMU-Arctic database show that the quality of synthesis is superior to that obtained using the WORLD envelope. This study thus establishes the Riesz envelope as an efficient alternative to the WORLD envelope.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: Pitch-adaptive spectrogram; Riesz transform; Spectrogram demodulation; Speech synthesis; WaveNet
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 05 Dec 2022 09:55
Last Modified: 05 Dec 2022 09:55
URI: https://eprints.iisc.ac.in/id/eprint/78252

Actions (login required)

View Item View Item