Rao, Achuth M and Ghosh, Prasanta Kumar (2017) Pitch Prediction from Mel-generalized Cepstrum - a Computationally Efficient Pitch Modeling Approach for Speech Synthesis. In: 25th European Signal Processing Conference (EUSIPCO), AUG 28-SEP 02, 2017, GREECE, pp. 1629-1633.
PDF
Eur_Sig_Pro_Con_1629_2017.pdf - Published Version Restricted to Registered users only Download (417kB) | Request a copy |
Abstract
Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (F0-model) and spectral representation, namely Mel generalized coefficients (MGC) (MGC-model). In this paper we estimate pitch from the MGC estimated using MGC-model instead of having a separate F0-model. Pitch is obtained from the estimated MGC using a statistical mapping through Gaussian mixture model (GMM). Experiments using CMU-ARCTIC database demonstrate that the proposed GMM based F0-model, even with a single mixture, results in no significant loss in the naturalness of the synthesized speech while the proposed F0-model, in addition to reducing computational complexity, results in similar to 93% reduction in the number of parameters compared to that of the F0-model.
Item Type: | Conference Proceedings |
---|---|
Series.: | European Signal Processing Conference |
Publisher: | IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA |
Additional Information: | Copy right for this article belong to IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 18 Apr 2018 18:22 |
Last Modified: | 18 Apr 2018 18:22 |
URI: | http://eprints.iisc.ac.in/id/eprint/59638 |
Actions (login required)
View Item |