ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A comparative study on the effect of different codecs on speech recognition accuracy using various acoustic modeling techniques

Raghavan, Srinivasa and Meenakshi, Nisha and Mittal, Sanjeev Kumar and Yarra, Chiranjeevi and Mandal, Anupam and Prasanna Kumar, KR and Ghosh, Prasanta Kumar (2017) A comparative study on the effect of different codecs on speech recognition accuracy using various acoustic modeling techniques. In: 23rd National Conference on Communications, NCC 2017, 02-04 March 2017, Chennai, India, pp. 1-6.

[img] PDF
IEEE_NCC_2017.pdf - Published Version
Restricted to Registered users only

Download (107kB) | Request a copy
Official URL: https://doi.org/10.1109/NCC.2017.8077042


In this work, we study the effect of codec induced distortion on the speech recognition performance in the TIMIT corpus using eleven codecs and five acoustic modeling techniques (AMTs) including several state-of-the-art methods. This study is performed in a single round of encoding-decoding and various tandem scenarios. Experiments from the single encodingdecoding case reveal that the acoustic models from G.711A, a narrowband high bit rate codec yields lower phone error rate (PER) compared to low bit rate codecs for most AMTs. It is observed that among the eleven codecs based acoustic models, G.711A, G.728, G.729B, AMR-WB and G.729A codecs consistently result in the least five PERs across AMTs. It is found that the model trained on 'clean' speech data (PCM) performs poorly in three of the five AMTs compared to these five codec based acoustic models. These five models are then used in six different tandem scenarios comprising three unseen codecs. Similar to the single round of encoding-decoding case, the PER for each of the tandem scenarios turns out to be the lowest consistently for all AMTs when the acoustic model from the G.711A codec is used. However, when the acoustic model is trained with mixed speech data from all tandem scenarios, the PER is found to perform better than the matched condition in the case of four out of five AMTs.

Item Type: Conference Paper
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The Copyright of this article belongs to the Institute of Electrical and Electronics Engineers Inc.
Keywords: Bit error rate; Decoding; Encoding (symbols); Image coding; Signal encoding; Acoustic model; Comparative studies; Encoding-decoding; High bit rates; Phone error rate; Recognition accuracy; Speech recognition performance; State-of-the-art methods; Speech recognition
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 13 Jun 2022 05:54
Last Modified: 13 Jun 2022 05:54
URI: https://eprints.iisc.ac.in/id/eprint/73302

Actions (login required)

View Item View Item