Representation Learning Using Convolution Neural Network for Acoustic-to-articulatory Inversion

Illa, A and Ghosh, PK (2019) Representation Learning Using Convolution Neural Network for Acoustic-to-articulatory Inversion. In: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 - 17 May 2019, Brighton, pp. 5931-5935.

PDF
ICASSP_2019.pdf - Published Version
Restricted to Registered users only
Download (14MB) | Request a copy

Official URL: https://doi.org/10.1109/ICASSP.2019.8682506

Abstract

Recent techniques employ end-to-end systems to learn relevant features for several speech related applications, including speech recognition, and speaker verification. In this work, we focus on the task of acoustic-to-articulatory inversion (AAI) for which we propose an end-to-end system that comprises a convolution neural network (CNN) and a bidirectional long short-term memory network (BLSTM). The aim of this work is to understand the nature of the features learnt by the end-to-end model and the importance of pre-emphasis in representation learning for AAI. Further, we propose a subject adaptation scheme to overcome the limitations of the availability of parallel acoustic-articulatory data to train an end-to-end AAI system. The AAI performance is evaluated with ~3.19 hours of acoustic-articulatory data collected from 8 subjects. Experiments reveal that, the frequency response of filters learnt by the CNN in the proposed system resembles those of the mel-scale, and hence, the performance of the proposed system (RMSE=1.47mm) is on par with that using mel-frequency cepstral coefficients (1.42mm) as features. Using pre-emphasis reduces RMSE by 0.13mm, and also the proposed adaptation scheme performs better than a subject-specific AAI model by an RMSE of 0.21mm despite of limited acoustic-articulatory data from a subject.

Item Type:	Conference Paper
Publication:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords:	Convolution; Frequency response; Speech communication; Speech recognition, Articulatory inversion; BLSTM; Convolution neural network; electromagnetic articulograph; End-to-end systems; Mel frequency cepstral co-efficient; Relevant features; Speaker verification, Audio signal processing
Department/Centre:	Division of Electrical Sciences > Electrical Engineering
Date Deposited:	15 Dec 2022 08:11
Last Modified:	15 Dec 2022 08:11
URI:	https://eprints.iisc.ac.in/id/eprint/78375

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India