Madhavi, Maulik C and Sharma, Shubham and Patil, Hemant A (2015) Vocal Tract Length Normalization Features for Audio Search. In: 18th International Conference on Text, Speech and Dialogue (TSD), SEP 14-17, 2015, Pilsen, CZECH REPUBLIC, pp. 387-395.
Full text not available from this repository. (Request a copy)Abstract
This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.
Item Type: | Conference Proceedings |
---|---|
Series.: | Lecture Notes in Artificial Intelligence |
Publisher: | SPRINGER-VERLAG BERLIN |
Additional Information: | Copy right for this article belongs to the SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY |
Keywords: | Vocal tract length normalization; Audio search; Scale transform cepstral coefficients; Warped linear prediction coefficients |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 06 Jan 2016 05:41 |
Last Modified: | 06 Jan 2016 05:41 |
URI: | http://eprints.iisc.ac.in/id/eprint/53068 |
Actions (login required)
View Item |