Jayesh, MK and Sharma, M and Vonteddu, P and Shaik, MAB and Ganapathy, S (2022) Transformer Networks for Non-Intrusive Speech Quality Prediction. In: 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, 18 - 22 September 2022, Incheon, pp. 4078-4082.
PDF
INTERSPEECH_2022.pdf - Published Version Restricted to Registered users only Download (283kB) | Request a copy |
Abstract
This paper presents the details of our speech quality prediction system submitted to the Conferencing Speech-2022 challenge. The challenge involved the task of non-intrusive speech quality assessment intended for online conferencing applications. We propose two approaches for speech quality prediction in this work. The first approach uses a combination of deep convolutional neural network (CNN) and LSTM neural network with Kullback-Leibler (KL) loss function and cross entropy (CE) loss function for estimating the mean opinion scores (MOS). Our second approach uses transformer based encoder network before applying attention pooling. We observe that our proposed second method gives significant improvements compared to our first method as well as on the baselines provided by the challenge organizers with respect to Pearson Correlation Coefficient (PCC) and Spearman Rank Correlation Coefficient (SRCC) along with reductions in root mean square error (RMSE). The model is also seen to generalize for unseen data resources on the evaluation dataset.
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publisher: | International Speech Communication Association |
Additional Information: | The copyright for this article belongs to International Speech Communication Association. |
Keywords: | Convolutional neural networks; Correlation methods; Deep neural networks; Forecasting; Long short-term memory; Speech communication, Mean opinion score; Mean opinion scores; Non-intrusive; Quality estimation; Quality prediction; Quality prediction system; Speech quality; Speech quality assessment; Speech quality estimation; Transformer, Mean square error |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 10 Nov 2022 06:20 |
Last Modified: | 10 Nov 2022 06:20 |
URI: | https://eprints.iisc.ac.in/id/eprint/77856 |
Actions (login required)
View Item |