ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis

Mv, AR and Ghosh, PK (2020) SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis. In: IEEE Signal Processing Letters, 27 . pp. 1170-1174.

[img] PDF
iee_sig_pro_let_27_1170-1174_2020.pdf - Published Version
Restricted to Registered users only

Download (511kB) | Request a copy
Official URL: https://dx.doi.org/10.1109/LSP.2020.3005031

Abstract

Recently, neural speech synthesizers have achieved a high-quality synthesis for text-to-speech applications, but a real-time synthesis is possible only in the devices which have high memory and allow large computational complexity. In this work, we reduce the complexity of a speech synthesizer by reformulating the source-filter model of speech where the excitation signal is modeled as a sum of two signals. The first signal contains an impulse train that is computed from the pitch sequence. The second signal is modeled as white noise passed through a filter bank with frequency dependent gains. The parameters of the reformulated source-filter model are predicted using a neural network, referred to as SFNet. The network parameters are learnt by training the network using l1-error between the log Mel-spectrum of the predicted waveform and that of the ground-truth waveform. We demonstrate that there is a significant reduction in the memory and computational complexity compared to the state-of-the-art speaker independent neural speech synthesizer without any loss of the naturalness of the synthesized speech. © 1994-2012 IEEE.

Item Type: Journal Article
Publication: IEEE Signal Processing Letters
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright of this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Computational complexity; Speech synthesis; White noise, Computationally efficient; Excitation signals; Frequency dependent; Network parameters; Real-time synthesis; Source filter model of speech; Source-filter models; Speaker independents, Complex networks
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 17 Aug 2020 07:19
Last Modified: 17 Aug 2020 07:19
URI: http://eprints.iisc.ac.in/id/eprint/66328

Actions (login required)

View Item View Item