On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification

Kumar, Rajath and Yeruva, Vaishnavi and Ganapathy, Sriram (2018) On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification. In: 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018;, 2 September to 6 September 2018, Hyderabad International Convention Centre (HICC)Hyderabad, pp. 1121-1125.

PDF
interspeech(5).pdf - Published Version
Restricted to Registered users only
Download (237kB) | Request a copy

Official URL: https://dx.doi.org/10.21437/Interspeech.2018-1759

Abstract

The task of personalized keyword detection system which also performs text dependent speaker verification (TDSV) has received substantial interest recently. Conventional approaches to this task involve the development of the TDSV and wakeup -word detection systems separately. In this paper, we show that TDSV and keyword spotting (KWS) can be jointly modeled using the convolutional long short term memory (CLSTM) model architecture, where an initial convolutional feature map is further processed by a LSTM recurrent network. Given a small amount of training data for developing the CLSTM system, we show that the model provides accurate detection of the presence of the keyword in spoken utterance. For the TDSV task, the MTL model can be well regularized using the CLSTM training examples for personalized wake up task. The experiments are performed for KWS wake up detection and TDSV using the combined speech recordings from Wall Street Journal (WSJ) and LibriSpeech corpus. In these experiments with multiple keywords, we illustrate that the proposed approach of MTL significantly improves the performance of previously proposed neural network based text dependent SV systems. We also experimentally illustrate that the CLSTM model provides significant improvements over previously proposed keyword detection systems as well (average relative improvements of 30% over previous approaches).

Item Type:	Conference Proceedings
Series.:	Interspeech
Publisher:	ISCA-INT SPEECH COMMUNICATION ASSOC
Additional Information:	19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, AUG 02-SEP 06, 2018
Keywords:	Text dependent speaker verification; Key Word Spotting (KWS); Convolutional Long Short Term Memory (CLSTM) Network; Multi-task learning
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	09 Jun 2020 06:05
Last Modified:	09 Jun 2020 06:05
URI:	http://eprints.iisc.ac.in/id/eprint/62919

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India