SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis

Mv, AR and Ghosh, PK (2020) SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis. In: IEEE Signal Processing Letters, 27 . pp. 1170-1174.

PDF
iee_sig_pro_let_27_1170-1174_2020.pdf - Published Version
Restricted to Registered users only
Download (511kB) | Request a copy

Official URL: https://dx.doi.org/10.1109/LSP.2020.3005031

Abstract

Recently, neural speech synthesizers have achieved a high-quality synthesis for text-to-speech applications, but a real-time synthesis is possible only in the devices which have high memory and allow large computational complexity. In this work, we reduce the complexity of a speech synthesizer by reformulating the source-filter model of speech where the excitation signal is modeled as a sum of two signals. The first signal contains an impulse train that is computed from the pitch sequence. The second signal is modeled as white noise passed through a filter bank with frequency dependent gains. The parameters of the reformulated source-filter model are predicted using a neural network, referred to as SFNet. The network parameters are learnt by training the network using l1-error between the log Mel-spectrum of the predicted waveform and that of the ground-truth waveform. We demonstrate that there is a significant reduction in the memory and computational complexity compared to the state-of-the-art speaker independent neural speech synthesizer without any loss of the naturalness of the synthesized speech. Â© 1994-2012 IEEE.

Item Type:	Journal Article
Publication:	IEEE Signal Processing Letters
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright of this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords:	Computational complexity; Speech synthesis; White noise, Computationally efficient; Excitation signals; Frequency dependent; Network parameters; Real-time synthesis; Source filter model of speech; Source-filter models; Speaker independents, Complex networks
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	17 Aug 2020 07:19
Last Modified:	17 Aug 2020 07:19
URI:	http://eprints.iisc.ac.in/id/eprint/66328

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India