Madhavi, Maulik C and Sharma, Shubham and Patil, Hemant A (2016) VTLN Using Different Warping Functions for Template Matching. [Book Chapter]
Full text not available from this repository.Abstract
In most automatic speech recognition (ASR) systems, speaker differences are compensated by normalizing the vocal tract lengths of the speakers. This is implemented by warping the frequency-axis by appropriate warping factor. However, it is computationally expensive to find warping factor for each speaker. This problem is overcome by incorporating a universal warping function for all the speakers. Different psychoacoustic scales have been proposed over the past decade that are assumed to be similar to the frequency response of basilarmembrane (BM) of human auditory system. In this paper, different warping functions are studied with an aim of vocal tract length normalization (VTLN) and template matching experiments are done using dynamic time warping (DTW) algorithm to test the performance of various warping functions. It was observed that features obtained by warping the frequency-axis by psychoacoustic scales improve the classification performance. In particular, Equivalent Rectangular Bandwidth (ERB)-scale based warping improves the precision by 7.17% over state-of-the-art mel frequency cepstral coefficients (MFCC) for template matching done on isolated digits of TIDIGITS database and 6.16% on words from TIMIT database.
Item Type: | Book Chapter |
---|---|
Series.: | Studies in Big Data |
Publisher: | SPRINGER-VERLAG BERLIN |
Additional Information: | The copyright of this article belongs to the Springer Science and Business Media Deutschland GmbH |
Keywords: | Vocal tract length normalization; Frequency warping; Dynamic time warping; Template matching |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 08 Jul 2016 05:53 |
Last Modified: | 13 Jul 2022 05:05 |
URI: | https://eprints.iisc.ac.in/id/eprint/54164 |
Actions (login required)
View Item |