ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Towards Relevance and Sequence Modeling in Language Recognition

Padi, B and Mohan, A and Ganapathy, S (2020) Towards Relevance and Sequence Modeling in Language Recognition. In: IEEE/ACM Transactions on Audio Speech and Language Processing, 28 . pp. 1223-1232.

[img] PDF
iee_acm_tra_aud_spe_lan_pro_28_1223-1232_2020.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://dx.doi.org/10.1109/TASLP.2020.2983580

Abstract

The task of automatic language identification (LID) involving multiple dialects of the same language family in the presence of noise is a challenging problem. In these scenarios, the identity of the language/dialect may be reliably present only in parts of the temporal sequence of the speech signal. The conventional approaches to LID (and for speaker recognition) ignore the sequence information by extracting long-term statistical summary of the recording assuming an independence of the feature frames. In this paper, we propose a neural network framework utilizing short-sequence information in language recognition. In particular, a new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task. This relevance weighting is achieved using the bidirectional long short-term memory (BLSTM) network with attention modeling. We explore two approaches, the first approach uses segment level i-vector/x-vector representations that are aggregated in the neural model and the second approach where the acoustic features are directly modeled in an end-to-end neural model. Experiments are performed using the language recognition task in NIST LRE 2017 Challenge using clean, noisy and multi-speaker speech data as well as in the RATS language recognition corpus. In these experiments on noisy LRE tasks as well as the RATS dataset, the proposed approach yields significant improvements over the conventional i-vector/x-vector based language recognition approaches as well as with other previous models incorporating sequence information.

Item Type: Journal Article
Publication: IEEE/ACM Transactions on Audio Speech and Language Processing
Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Additional Information: The copyright of this article belongs to IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords: Modeling languages; Natural language processing systems; Rats; Syntactics, Automatic language identification; Conventional approach; Language recognition; Network frameworks; Sequence informations; Speaker recognition; Statistical summary; Temporal sequences, Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 23 Jun 2020 09:29
Last Modified: 23 Jun 2020 09:29
URI: http://eprints.iisc.ac.in/id/eprint/65504

Actions (login required)

View Item View Item