ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers

Duseja, T and Annervaz, KM and Duggani, J and Zacharia, S and Free, M and Dukkipati, A (2024) Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers. In: 7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024, 7 August 2024through 9 August 2024, San Jose, pp. 403-409.

[img] PDF
Lea_Swi_Swi_Int_Mod_Lar_Pre_Tra_2024.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.1109/MIPR62202.2024.00070

Abstract

Transformer models that revolutionized foundation models are ubiquitous nowadays. Hence, there has been a surge in pre-trained transformers that can be fine-tuned to perform different downstream tasks. Most pre-trained transformers are trained only on a single modality, and there is no direct way to fine-tune them in multiple modalities. To tackle this issue, in this paper, we propose a general-purpose gate, SSIM (Switch off, Switch on, and Integrate Modalities), by which one can integrate other modalities into large pre-trained language transformers. The proposed SSIM gate helps to obtain the unified representation by soft-switching between multi-modal interactions. To evaluate our approach, we have established benchmarks using pre-trained language transformers like BERT, XLNet, and T5 on multi-modal tasks such as Sentiment and Emotion analysis (CMU-MOSI, CMU-MOSEI), Emotion Recognition in Conversations (IEMOCAP, MELD), and Multimodal Intent Recognition (MIntRec), achieving close to State-of-the-art results. © 2024 IEEE.

Item Type: Conference Paper
Publication: Proceedings of the International Conference on Multimedia Information Processing and Retrieval, MIPR
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to the publishers.
Keywords: Benchmarking; Distribution transformers; Gesture recognition; Modal analysis; Problem oriented languages; Speech recognition; Unified Modeling Language, Down-stream; Emotion recognition; Foundation models; Multi-modal; Multi-modal emotion recognition; Multiple modalities; Pre-trained model; Sentiment analysis; Switch-on; Transformer modeling, Emotion Recognition
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 06 Dec 2024 17:09
Last Modified: 06 Dec 2024 17:09
URI: http://eprints.iisc.ac.in/id/eprint/87106

Actions (login required)

View Item View Item