ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Clean speech AE-DNN PSD constraint for MCLP based reverberant speech enhancement

Chetupalli, SR and Sreenivas, TV (2019) Clean speech AE-DNN PSD constraint for MCLP based reverberant speech enhancement. In: 27th European Signal Processing Conference, 2 - 6 September 2019, Coruna; Spain.

[img] PDF
ESPC_2019.pdf - Published Version
Restricted to Registered users only

Download (12MB) | Request a copy
Official URL: https://doi.org/10.23919/EUSIPCO.2019.8902710

Abstract

Blind inverse filtering using multi-channel linear prediction (MCLP) in short-time Fourier transform (STFT) domain is an effective means to enhance reverberant speech. Traditionally, a speech power spectral density (PSD) weighted prediction error (WPE) minimization approach is used to estimate the prediction filters, independently in each frequency bin. The method is sensitive to the estimation of desired signal PSD. In this paper, we propose an auto-encoder (AE) deep neural network (DNN) based constraint for the estimation of desired signal PSD. An auto encoder trained on clean speech STFT coefficients is used as the prior to non-linearly map the natural speech PSD. We explore two different architectures for the auto-encoder: (i) fully-connected (FC) feed-forward, and (ii) recurrent long short-term memory (LSTM) architecture. Experiments using real room impulse responses show that the LSTM-DNN based PSD estimate performs better than the traditional methods for reverberant signal enhancement. © 2019 IEEE

Item Type: Conference Paper
Publication: European Signal Processing Conference
Publisher: European Signal Processing Conference, EUSIPCO
Additional Information: The copyright for this article belongs to European Signal Processing Conference, EUSIPCO
Keywords: Channel coding; Forecasting; Inverse problems; Long short-term memory; Mathematical transformations; Network architecture; Power spectral density; Reverberation; Signal analysis; Signal encoding; Spectral density; Speech enhancement, Auto encoders; Dereverberation; Linear prediction; Power spectral densities (PSD); Prior; Reverberant speech enhancements; Room impulse response; Short time Fourier transforms, Deep neural networks
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 06 Jan 2023 09:38
Last Modified: 06 Jan 2023 09:38
URI: https://eprints.iisc.ac.in/id/eprint/78835

Actions (login required)

View Item View Item