Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers

Duseja, T and Annervaz, KM and Duggani, J and Zacharia, S and Free, M and Dukkipati, A (2024) Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers. In: 7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024, 7 August 2024through 9 August 2024, San Jose, pp. 403-409.

PDF
Lea_Swi_Swi_Int_Mod_Lar_Pre_Tra_2024.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy

Official URL: https://doi.org/10.1109/MIPR62202.2024.00070

Abstract

Transformer models that revolutionized foundation models are ubiquitous nowadays. Hence, there has been a surge in pre-trained transformers that can be fine-tuned to perform different downstream tasks. Most pre-trained transformers are trained only on a single modality, and there is no direct way to fine-tune them in multiple modalities. To tackle this issue, in this paper, we propose a general-purpose gate, SSIM (Switch off, Switch on, and Integrate Modalities), by which one can integrate other modalities into large pre-trained language transformers. The proposed SSIM gate helps to obtain the unified representation by soft-switching between multi-modal interactions. To evaluate our approach, we have established benchmarks using pre-trained language transformers like BERT, XLNet, and T5 on multi-modal tasks such as Sentiment and Emotion analysis (CMU-MOSI, CMU-MOSEI), Emotion Recognition in Conversations (IEMOCAP, MELD), and Multimodal Intent Recognition (MIntRec), achieving close to State-of-the-art results. Â© 2024 IEEE.

Item Type:	Conference Paper
Publication:	Proceedings of the International Conference on Multimedia Information Processing and Retrieval, MIPR
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to the publishers.
Keywords:	Benchmarking; Distribution transformers; Gesture recognition; Modal analysis; Problem oriented languages; Speech recognition; Unified Modeling Language, Down-stream; Emotion recognition; Foundation models; Multi-modal; Multi-modal emotion recognition; Multiple modalities; Pre-trained model; Sentiment analysis; Switch-on; Transformer modeling, Emotion Recognition
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	06 Dec 2024 17:09
Last Modified:	06 Dec 2024 17:09
URI:	http://eprints.iisc.ac.in/id/eprint/87106

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India