Mannem, R and Ghosh, PK (2021) A deep neural network based correction scheme for improved air-tissue boundary prediction in real-time magnetic resonance imaging video. In: Computer Speech and Language, 66 .
PDF
COM_SPE_LAN_66.pdf - Published Version Restricted to Registered users only Download (4MB) | Request a copy |
Abstract
The real-time Magnetic Resonance Imaging (rtMRI) video captures the vocal tract movements in the mid-sagittal plane during speech. Air tissue boundaries (ATBs) are contours that trace the transition between the high-intensity tissue corresponding to the speech articulators and the low-intensity airway cavity in the rtMRI video. The ATB segmentation in an rtMRI video is a common preprocessing step which is used for many speech production and speech processing applications. However, ATB segmentation is very challenging due to the low resolution and low signal-to-noise ratio of the rtMRI images. Several works have been proposed in the literature for accurate ATB segmentation. However, every ATB segmentation technique, be it knowledge-based or data-driven, has its own limitations due to model assumption or data quality. The errors in the predicted ATBs from a typical ATB segmentation approach can be corrected in a data-driven manner as a post-processing step. In this work, we propose a deep neural network (DNN) based correction scheme for improving the ATB segmentation. In the DNN based correction approach, the correction of each point on a predicted ATB is done using a pattern of intensity variation in the direction of the normal to the predicted ATB at that point. For this, inputs and target outputs needed for DNN training are generated using a normal-grid based method. Experimental results show that the proposed DNN based correction yields more accurate ATBs in terms of Dynamic Time Warping (DTW) distance compared to the ATB segmentation approaches it is applied on. Thus, the DNN based correction could be used as a post-processing step to improve the accuracy of the predicted ATBs from any segmentation scheme. © 2020 Elsevier Ltd
Item Type: | Journal Article |
---|---|
Publication: | Computer Speech and Language |
Publisher: | Academic Press |
Additional Information: | Copyright to this article belongs to Academic Press |
Keywords: | Deep neural networks; Image segmentation; Knowledge based systems; Magnetic resonance imaging; Signal to noise ratio; Speech processing; Tissue, Correction approaches; Dynamic time warping; Intensity variations; Low signal-to-noise ratio; Pre-processing step; Processing applications; Segmentation techniques; Vocal tract movements, Neural networks |
Department/Centre: | Division of Electrical Sciences > Electrical Engineering |
Date Deposited: | 03 Mar 2021 05:40 |
Last Modified: | 03 Mar 2021 05:40 |
URI: | http://eprints.iisc.ac.in/id/eprint/66938 |
Actions (login required)
View Item |