Mannem, R and Jyothi, H and Illa, A and Ghosh, PK (2020) Speech rate estimation using representations learned from speech with convolutional neural network. In: SPCOM 2020 - International Conference on Signal Processing and Communications, 19 July - 24 July 2020, Bangalore.
PDF
Int_con_Sig_2020.pdf - Published Version Restricted to Registered users only Download (631kB) | Request a copy |
Abstract
With advancement in machine learning techniques, several speech related applications deploy end-to-end models to learn relevant features from the raw speech signal. In this work, we focus on the speech rate estimation task using an end-to-end model to learn representation from raw speech in a data driven manner. We propose an end-to-end model that comprises of 1-d convolutional layer to extract representations from raw speech and a convolutional dense neural network (CDNN) to predict speech rate from these representations. The primary aim of the work is to understand the nature of representations learned by end-to-end model for the speech rate estimation task. Experiments are performed using TIMIT corpus, in seen and unseen subject conditions. Experimental results reveal that, the frequency response of the learned 1-d CNN filters are low-pass in nature, and center frequencies of majority of the filters lie below 1000Hz. While comparing the performance of the proposed end-to-end system with the baseline MFCC based approach, we find that the performance of the learned features with CNN are on par with MFCC.
Item Type: | Conference Proceedings |
---|---|
Publication: | SPCOM 2020 - International Conference on Signal Processing and Communications |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc. |
Keywords: | Convolution; Convolutional neural networks; Frequency response; Learning systems; Multilayer neural networks; Signal processing, Center frequency; Data driven; End-to-end models; End-to-end systems; Machine learning techniques; Relevant features; Speech rates; Speech signals, Speech |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 06 Feb 2023 06:30 |
Last Modified: | 06 Feb 2023 06:30 |
URI: | https://eprints.iisc.ac.in/id/eprint/79851 |
Actions (login required)
View Item |