Illa, A and Ghosh, PK (2019) Representation Learning Using Convolution Neural Network for Acoustic-to-articulatory Inversion. In: 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, 12 - 17 May 2019, Brighton, pp. 5931-5935.
PDF
ICASSP_2019.pdf - Published Version Restricted to Registered users only Download (14MB) | Request a copy |
Abstract
Recent techniques employ end-to-end systems to learn relevant features for several speech related applications, including speech recognition, and speaker verification. In this work, we focus on the task of acoustic-to-articulatory inversion (AAI) for which we propose an end-to-end system that comprises a convolution neural network (CNN) and a bidirectional long short-term memory network (BLSTM). The aim of this work is to understand the nature of the features learnt by the end-to-end model and the importance of pre-emphasis in representation learning for AAI. Further, we propose a subject adaptation scheme to overcome the limitations of the availability of parallel acoustic-articulatory data to train an end-to-end AAI system. The AAI performance is evaluated with ~3.19 hours of acoustic-articulatory data collected from 8 subjects. Experiments reveal that, the frequency response of filters learnt by the CNN in the proposed system resembles those of the mel-scale, and hence, the performance of the proposed system (RMSE=1.47mm) is on par with that using mel-frequency cepstral coefficients (1.42mm) as features. Using pre-emphasis reduces RMSE by 0.13mm, and also the proposed adaptation scheme performs better than a subject-specific AAI model by an RMSE of 0.21mm despite of limited acoustic-articulatory data from a subject.
Item Type: | Conference Paper |
---|---|
Publication: | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc. |
Keywords: | Convolution; Frequency response; Speech communication; Speech recognition, Articulatory inversion; BLSTM; Convolution neural network; electromagnetic articulograph; End-to-end systems; Mel frequency cepstral co-efficient; Relevant features; Speaker verification, Audio signal processing |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 15 Dec 2022 08:11 |
Last Modified: | 15 Dec 2022 08:11 |
URI: | https://eprints.iisc.ac.in/id/eprint/78375 |
Actions (login required)
View Item |