ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Transformer Networks for Non-Intrusive Speech Quality Prediction

Jayesh, MK and Sharma, M and Vonteddu, P and Shaik, MAB and Ganapathy, S (2022) Transformer Networks for Non-Intrusive Speech Quality Prediction. In: 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, 18 - 22 September 2022, Incheon, pp. 4078-4082.

[img] PDF
INTERSPEECH_2022.pdf - Published Version
Restricted to Registered users only

Download (283kB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2022-10020

Abstract

This paper presents the details of our speech quality prediction system submitted to the Conferencing Speech-2022 challenge. The challenge involved the task of non-intrusive speech quality assessment intended for online conferencing applications. We propose two approaches for speech quality prediction in this work. The first approach uses a combination of deep convolutional neural network (CNN) and LSTM neural network with Kullback-Leibler (KL) loss function and cross entropy (CE) loss function for estimating the mean opinion scores (MOS). Our second approach uses transformer based encoder network before applying attention pooling. We observe that our proposed second method gives significant improvements compared to our first method as well as on the baselines provided by the challenge organizers with respect to Pearson Correlation Coefficient (PCC) and Spearman Rank Correlation Coefficient (SRCC) along with reductions in root mean square error (RMSE). The model is also seen to generalize for unseen data resources on the evaluation dataset.

Item Type: Conference Paper
Publication: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher: International Speech Communication Association
Additional Information: The copyright for this article belongs to International Speech Communication Association.
Keywords: Convolutional neural networks; Correlation methods; Deep neural networks; Forecasting; Long short-term memory; Speech communication, Mean opinion score; Mean opinion scores; Non-intrusive; Quality estimation; Quality prediction; Quality prediction system; Speech quality; Speech quality assessment; Speech quality estimation; Transformer, Mean square error
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 10 Nov 2022 06:20
Last Modified: 10 Nov 2022 06:20
URI: https://eprints.iisc.ac.in/id/eprint/77856

Actions (login required)

View Item View Item