End-to-end Language Recognition Using Attention Based Hierarchical Gated Recurrent Unit Models

Padi, B and Mohan, A and Ganapathy, S (2019) End-to-end Language Recognition Using Attention Based Hierarchical Gated Recurrent Unit Models. In: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 - 17 May 2019, Brighton, pp. 5966-5970.

PDF
ICASSP_2019.pdf - Published Version
Restricted to Registered users only
Download (13MB) | Request a copy

Official URL: https://doi.org/10.1109/ICASSP.2019.8683895

Abstract

The task of automatic language identification (LID) involving multiple dialects of the same language family on short speech recordings is a challenging problem. This can be further complicated for short-duration audio snippets in the presence of noise sources. In these scenarios, the identity of the language/dialect may be reliably present only in parts of the speech embedded in the temporal sequence. The conventional approaches to LID (and for speaker recognition) ignore the sequence information by extracting long-term statistical summary of the recording assuming an independence of the feature frames. In this paper, we propose to develop an end-to-end neural network framework utilizing short-sequence information in language recognition. A hierarchical gated recurrent unit (HGRU) model with attention module is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task. Experiments are performed using the language recognition task in NIST LRE 2017 Challenge using clean, noisy and multi-speaker speech data. In these experiments, the proposed approach yields significant improvements over the conventional i-vector based language recognition approaches as well as previously proposed approach to language recognition using recurrent networks.

Item Type:	Conference Paper
Publication:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords:	Audio acoustics; Audio recordings; Audio signal processing; Natural language processing systems; Recurrent neural networks; Speech; Speech communication; Syntactics, attention; Automatic language identification; Conventional approach; hierarchical GRU; Language identification; Language recognition; Sequence informations; Speaker recognition, Speech recognition
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	30 Nov 2022 09:35
Last Modified:	30 Nov 2022 09:35
URI:	https://eprints.iisc.ac.in/id/eprint/78371

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India