SSDMM-VAE: variational multi-modal disentangled representation learning

Mondal, AK and Sailopal, A and Singla, P and Ap, P (2022) SSDMM-VAE: variational multi-modal disentangled representation learning. In: Applied Intelligence .

PDF
app_int_2022.pdf - Published Version
Restricted to Registered users only
Download (3MB) | Request a copy

Official URL: https://doi.org/10.1007/s10489-022-03936-z

Abstract

Multi-modal learning aims at simultaneously modelling data from several modalities such as image, text and speech. The goal is to simultaneously learn representations and make them disentangled so that a variety of downstream tasks such as causal reasoning, fair ML and domain adaptation are well supported. In this work, we propose a novel semi-supervised method to learn disentangled representations for multi-modal data using variational inference. We incorporate a two-component latent space in a Variational Auto-Encoder (VAE) that comprises of domain-invariant (shared) and domain-specific (private) representations across modalities with partitioned discrete and continuous components. We combine the shared continuous and discrete latent spaces via Product-of-experts and statistical ensembles, respectively. We conduct several experiments on multiple multimodal datasets (dSprite-Text, Shaped3D-Text) to demonstrate the efficacy of the proposed method for learning disentangled representation. The proposed method achieves state-of-the-art FactorVAE Scores (0.93 and 1.00 respectively) surpassing the performance of various unimodal and multimodal baselines. Further, we demonstrate the benefits of learning disentangled joint representations on several downstream tasks (generation and classification) using MNIST-MADBase dataset with a joint coherence score of 96.95. We demonstrate the use of variational inference for disentangled joint representation in a semi-supervised multimodal settings and its benefits in various downstream tasks.

Item Type:	Journal Article
Publication:	Applied Intelligence
Publisher:	Springer
Additional Information:	The copyright for this article belongs to the Springer.
Keywords:	Classification (of information); Learning systems; Modal analysis, Auto encoders; Disentangled representation learning; Down-stream; Image texts; Learn+; Modeling data; Multi-modal; Multi-modal learning; Multimodal variational auto-encoder; Variational inference, Supervised learning
Department/Centre:	Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited:	10 Aug 2022 06:42
Last Modified:	10 Aug 2022 06:42
URI:	https://eprints.iisc.ac.in/id/eprint/75803

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India