Kumar, Rajath and Yeruva, Vaishnavi and Ganapathy, Sriram (2018) On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification. In: 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018;, 2 September to 6 September 2018, Hyderabad International Convention Centre (HICC)Hyderabad, pp. 1121-1125.
PDF
interspeech(5).pdf - Published Version Restricted to Registered users only Download (237kB) | Request a copy |
Abstract
The task of personalized keyword detection system which also performs text dependent speaker verification (TDSV) has received substantial interest recently. Conventional approaches to this task involve the development of the TDSV and wakeup -word detection systems separately. In this paper, we show that TDSV and keyword spotting (KWS) can be jointly modeled using the convolutional long short term memory (CLSTM) model architecture, where an initial convolutional feature map is further processed by a LSTM recurrent network. Given a small amount of training data for developing the CLSTM system, we show that the model provides accurate detection of the presence of the keyword in spoken utterance. For the TDSV task, the MTL model can be well regularized using the CLSTM training examples for personalized wake up task. The experiments are performed for KWS wake up detection and TDSV using the combined speech recordings from Wall Street Journal (WSJ) and LibriSpeech corpus. In these experiments with multiple keywords, we illustrate that the proposed approach of MTL significantly improves the performance of previously proposed neural network based text dependent SV systems. We also experimentally illustrate that the CLSTM model provides significant improvements over previously proposed keyword detection systems as well (average relative improvements of 30% over previous approaches).
Item Type: | Conference Proceedings |
---|---|
Series.: | Interspeech |
Publisher: | ISCA-INT SPEECH COMMUNICATION ASSOC |
Additional Information: | 19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, AUG 02-SEP 06, 2018 |
Keywords: | Text dependent speaker verification; Key Word Spotting (KWS); Convolutional Long Short Term Memory (CLSTM) Network; Multi-task learning |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 09 Jun 2020 06:05 |
Last Modified: | 09 Jun 2020 06:05 |
URI: | http://eprints.iisc.ac.in/id/eprint/62919 |
Actions (login required)
View Item |