ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Mel-scale sub-band modelling for perceptually improved time-scale modification of speech and audio signals

Sharma, Neeraj and Potadar, Shreepad and Chetupalli, Srikanth Raj and Sreenivas, TV (2017) Mel-scale sub-band modelling for perceptually improved time-scale modification of speech and audio signals. In: 23rd National Conference on Communications, NCC 2017, 02-04 March 2017, Chennai, India, pp. 1-5.

[img] PDF
IEEE_NCC_2017.pdf - Published Version
Restricted to Registered users only

Download (542kB) | Request a copy
Official URL: https://doi.org/10.1109/NCC.2017.8077073

Abstract

Good quality time-scale modification (TSM) of speech, and audio is a long standing challenge. The crux of the challenge is to maintain the perceptual subtilities of temporal variations in pitch and timbre even after time-scaling the signal. Widely used approaches, such as phase vocoder, and waveform overlap-add (OLA), are based on quasi-stationary assumption and the time-scaled signals have perceivable artifacts. In contrast to these approaches, we propose application of time-varying sinusoidal modeling for TSM, without any quasi-stationary assumption. The proposed model comprises of a mel-scale nonuniform bandwidth filter bank, and the instantaneous amplitude (IA), and instantaneous phase (IP) factorization of sub-band timevarying sinusoids. TSM of the signal is done by time-scaling IA, and IP in each sub-band. The lowpass nature of IA, and IP allows for time-scaling via interpolation. Formal listening tests on speech, and music (solo, and polyphonic) show reduction in TSM artifacts such as phasiness, and transient smearing. Further, the proposed approach gives improved quality in comparison to waveform synchronous OLA (WSOLA), phase vocoder with identity phase locking, and the recently proposed harmonicpercussive separation (HPS) based TSM methods. The obtained improvement in TSM quality highlights that speech analysis can benefit from appropriate choice of time-varying signal models.

Item Type: Conference Paper
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The Copyright of this article to Institute of Electrical and Electronics Engineers Inc.
Keywords: Locks (fasteners); Bandwidth filters; Instantaneous amplitude; Instantaneous phase; Quasi-stationary; Sinusoidal model; Temporal variation; Time varying signal; Time-scale modification; Vocoders
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 13 Jun 2022 05:55
Last Modified: 13 Jun 2022 05:55
URI: https://eprints.iisc.ac.in/id/eprint/73303

Actions (login required)

View Item View Item