ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Efficient Human-Quality Kannada TTS using Transfer Learning on NVIDIA's Tacotron2

Anil Kumar, KK and Shiva Kumar, HR and Ganesan, RA and Jnanesh, KP (2021) Efficient Human-Quality Kannada TTS using Transfer Learning on NVIDIA's Tacotron2. In: 7th IEEE International Conference on Electronics, Computing and Communication Technologies, 9-11 Jul 2021, Bangalore.

[img] PDF
IEEE_CONECCT_2021.pdf - Published Version
Restricted to Registered users only

Download (600kB) | Request a copy
Official URL: https://doi.org/10.1109/CONECCT52877.2021.9622581

Abstract

Very good quality, speech synthesis systems exist for languages like English and Chinese. However, only in the recent past, increased attention has been paid for developing TTS for Indian languages. There have been several reasons for the same in the past: 1) lack of adequate market, 2) non-availability of quality training data. In this work, we have developed a human-like quality Kannada text-to-speech conversion system using about 44.8 hours of training data recorded from a studio from a Kannada teacher with good diction. We have used the transfer learning technique to continue training over the Tacotron2 and WaveGlow checkpoints pre-trained on English. Evaluation by thirty five Kannada natives resulted in an overall MOS of 4.51±0.52, whereas the original speech of the speaker was given an MOS of 4.62 ± 0.53. In another independent testing, where another set of 25 human evaluators were given ten pairs of the original utterances of the speaker and the synthesized speech of the same sentences, some of the synthesized speech samples were judged to be better than the original! In a final round of evaluation, five sentences were synthesized by our TTS, Google's Wavenet TTS and also Nuance's TTS. Kannada natives were presented these outputs in a random order and asked to choose one of them as their most preferred output. Based on 55 human evaluators, RaGaVeRa's Kannada TTS obtained a mean preference score of 78.2, whereas Google's and Nuance's TTS got scores of 13.1 and 5.1, respectively. Thus, to the best of the knowledge of the authors, this is the best quality TTS that has ever been achieved for Kannada so far. © 2021 IEEE.

Item Type: Conference Paper
Publication: Proceedings of CONECCT 2021: 7th IEEE International Conference on Electronics, Computing and Communication Technologies
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Deep learning; Personnel training, Deep learning; End to end; End-to-end TTS; English; Google+; Kannada; Nuance; Ragavera; Tacotron2; Transfer learning; Waveglow, Speech synthesis
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 07 Feb 2022 12:21
Last Modified: 07 Feb 2022 12:21
URI: http://eprints.iisc.ac.in/id/eprint/71283

Actions (login required)

View Item View Item