ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Speech Enhancement Using a Risk Estimation Approach

Sadasivan, J and Seelamantula, CS and Muraka, NR (2020) Speech Enhancement Using a Risk Estimation Approach. In: Speech Communication, 116 . pp. 12-29.

[img] PDF
spe_com_116_12-29_2020.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: https://doi.org/10.1016/j.specom.2019.11.001


The goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure (risk). Often, this results in an estimate that depends on the unknown clean signal or its statistics. Since access to such priors is limited or impractical, one has to rely on an estimate of the clean signal statistics. In this paper, we develop a risk estimation framework for speech enhancement, in which one optimizes an unbiased estimate of the risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations and the noise statistics. Hence, the corresponding denoiser does not require the clean speech prior. We consider several speech-specific perceptually relevant distortion measures and develop corresponding unbiased estimates. Minimizing the risk estimates gives rise to denoisers, which are nonlinear functions of the a posteriori SNR. Listening tests show that, within the risk estimation framework, Itakura-Saito and weighted hyperbolic cosine distortions are superior than the other measures. Comparisons in terms of perceptual evaluation of speech quality (PESQ), segmental SNR (SSNR), source-to-distortion ratio (SDR), and short-time objective intelligibility (STOI) also indicate a superior performance for these two distortion measures. For SNRs greater than 5 dB, the proposed approach results in better denoising performance — both in terms of objective and subjective assessment — than techniques based on the Wiener filter, log-MSE minimization, and Bayesian nonnegative matrix factorization. © 2019 Elsevier B.V.

Item Type: Journal Article
Publication: Speech Communication
Publisher: Elsevier B.V.
Additional Information: The copyright for this article belongs to Elsevier B.V.
Keywords: Factorization; Matrix algebra; Risk assessment; Signal to noise ratio; Speech enhancement; Speech intelligibility, Distortion measures; Nonlinear functions; Nonnegative matrix factorization; Perceptual distortion measure; Perceptual evaluation of speech qualities; Risk estimation; Stein's lemma; Subjective assessments, Risk perception
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 08 Feb 2023 08:57
Last Modified: 08 Feb 2023 08:57
URI: https://eprints.iisc.ac.in/id/eprint/80069

Actions (login required)

View Item View Item