Sajjan, Neeraj and Ganesh, Shobhana and Sharma, Neeraj and Ganapathy, Sriram and Ryant, Neville (2018) LEVERAGING LSTM MODELS FOR OVERLAP DETECTION IN MULTI-PARTY MEETINGS. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), APR 15-20, 2018, Calgary, CANADA, pp. 5249-5253.
PDF
Ieee_Int_Con_Aco_Spe_Sig_Pro_5249_2018.pdf - Published Version Restricted to Registered users only Download (506kB) | Request a copy |
Abstract
The detection of overlapping speech segments is of key importance in speech applications involving analysis of multi-party conversations. The detection problem is challenging because overlapping speech segments are typically captured as short speech utterances far-field microphone recordings. In this paper, we propose detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models. The neural network architecture learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments. In order to evaluate the model performance, we perform experiments on simulated overlapped speech generated from the TIMIT database, and natural multi-talker conversational speech in the augmented Multiparty Interaction (AMI) meeting corpus. The proposed approach yields improvements over a Gaussian mixture model based overlap detection system. Furthermore, as an application of overlap detection, integration of overlap detection into speaker diarization task is shown to give improvement in diarization error rate.
Item Type: | Conference Proceedings |
---|---|
Publisher: | IEEE |
Additional Information: | Copy right for this article belong to IEEE |
Keywords: | Overlap Detection; LSTM modeling; Speaker Diarization; Conversational Speech Analysis |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 26 Oct 2018 14:44 |
Last Modified: | 26 Oct 2018 14:44 |
URI: | http://eprints.iisc.ac.in/id/eprint/60964 |
Actions (login required)
View Item |