Mondal, AK and Sailopal, A and Singla, P and Ap, P (2022) SSDMM-VAE: variational multi-modal disentangled representation learning. In: Applied Intelligence .
PDF
app_int_2022.pdf - Published Version Restricted to Registered users only Download (3MB) | Request a copy |
Abstract
Multi-modal learning aims at simultaneously modelling data from several modalities such as image, text and speech. The goal is to simultaneously learn representations and make them disentangled so that a variety of downstream tasks such as causal reasoning, fair ML and domain adaptation are well supported. In this work, we propose a novel semi-supervised method to learn disentangled representations for multi-modal data using variational inference. We incorporate a two-component latent space in a Variational Auto-Encoder (VAE) that comprises of domain-invariant (shared) and domain-specific (private) representations across modalities with partitioned discrete and continuous components. We combine the shared continuous and discrete latent spaces via Product-of-experts and statistical ensembles, respectively. We conduct several experiments on multiple multimodal datasets (dSprite-Text, Shaped3D-Text) to demonstrate the efficacy of the proposed method for learning disentangled representation. The proposed method achieves state-of-the-art FactorVAE Scores (0.93 and 1.00 respectively) surpassing the performance of various unimodal and multimodal baselines. Further, we demonstrate the benefits of learning disentangled joint representations on several downstream tasks (generation and classification) using MNIST-MADBase dataset with a joint coherence score of 96.95. We demonstrate the use of variational inference for disentangled joint representation in a semi-supervised multimodal settings and its benefits in various downstream tasks.
Item Type: | Journal Article |
---|---|
Publication: | Applied Intelligence |
Publisher: | Springer |
Additional Information: | The copyright for this article belongs to the Springer. |
Keywords: | Classification (of information); Learning systems; Modal analysis, Auto encoders; Disentangled representation learning; Down-stream; Image texts; Learn+; Modeling data; Multi-modal; Multi-modal learning; Multimodal variational auto-encoder; Variational inference, Supervised learning |
Department/Centre: | Division of Electrical Sciences > Electrical Communication Engineering |
Date Deposited: | 10 Aug 2022 06:42 |
Last Modified: | 10 Aug 2022 06:42 |
URI: | https://eprints.iisc.ac.in/id/eprint/75803 |
Actions (login required)
View Item |