ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

An Optimization Framework for Recovery of Speech From Phase-Encoded Spectrograms

Sainathan, Abhilash and Rudresh, Sunil and Seelamantula, Chandra Sekhar (2018) An Optimization Framework for Recovery of Speech From Phase-Encoded Spectrograms. In: 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, 2 September 2018 through 6 September 2018, Hyderabad International Convention Centre (HICC)Hyderabad; India, pp. 741-745.

[img] PDF
Interspeech 2018(3).pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.21437/Interspeech.2018-1987


In general, reconstruction of a speech signal from the spectrogram is non-unique because of the unavailability of the phase spectrum. Considering zero phase would result in a minimum phase reconstruction. This limitation is overcome by computing the recently introduced phase-encoded spectrogram. In this approach, one modifies each frame of a speech signal to possess the causal, delta-dominant (CDD) property prior to computing the spectrogram. In an earlier publication, we showed that finite-length CDD sequences can be retrieved exactly from their magnitude spectra using a scepstrum technique. Although exactness is guaranteed in principle, practical implementations result in a limited, but high, reconstruction accuracy. In this paper, we focus on increasing the reconstruction accuracy. We formulate the reconstruction problem within an optimization framework and deploy a recently proposed iterative, alternating direction method of multipliers (ADMM) algorithm called autocorrelation retrieval Kolmogorov factorization (CoRK). Experimental validations show that the CoRK algorithm results in a reconstruction accurate up to machine precision. We also show that both CoRK and cepstrum techniques are robust and invariant to the choice of the window duration, the amount of overlap between consecutive speech frames, the strength of the delta used to impart the CDD property, and the presence of noise.

Item Type: Conference Paper
Series.: Interspeech
Additional Information: 19th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2018), Hyderabad, INDIA, 02-SEP- 06 SEP, 2018
Keywords: Phase-encoded spectrogram; causal delta dominant sequence; autocorrelation retrieval and Kolmogorov factorization (CoRK); cepstrum
Department/Centre: Division of Electrical Sciences > Electronic Systems Engineering (Formerly Centre for Electronic Design & Technology)
Division of Electrical Sciences > Electrical Engineering
Date Deposited: 12 Mar 2020 08:46
Last Modified: 12 Mar 2020 08:46
URI: http://eprints.iisc.ac.in/id/eprint/62916

Actions (login required)

View Item View Item