Duseja, T and Annervaz, KM and Duggani, J and Zacharia, S and Free, M and Dukkipati, A (2024) Learning to Switch off, Switch on, and Integrate Modalities in Large Pre-trained Transformers. In: 7th IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2024, 7 August 2024through 9 August 2024, San Jose, pp. 403-409.
PDF
Lea_Swi_Swi_Int_Mod_Lar_Pre_Tra_2024.pdf - Published Version Restricted to Registered users only Download (1MB) | Request a copy |
Abstract
Transformer models that revolutionized foundation models are ubiquitous nowadays. Hence, there has been a surge in pre-trained transformers that can be fine-tuned to perform different downstream tasks. Most pre-trained transformers are trained only on a single modality, and there is no direct way to fine-tune them in multiple modalities. To tackle this issue, in this paper, we propose a general-purpose gate, SSIM (Switch off, Switch on, and Integrate Modalities), by which one can integrate other modalities into large pre-trained language transformers. The proposed SSIM gate helps to obtain the unified representation by soft-switching between multi-modal interactions. To evaluate our approach, we have established benchmarks using pre-trained language transformers like BERT, XLNet, and T5 on multi-modal tasks such as Sentiment and Emotion analysis (CMU-MOSI, CMU-MOSEI), Emotion Recognition in Conversations (IEMOCAP, MELD), and Multimodal Intent Recognition (MIntRec), achieving close to State-of-the-art results. © 2024 IEEE.
Item Type: | Conference Paper |
---|---|
Publication: | Proceedings of the International Conference on Multimedia Information Processing and Retrieval, MIPR |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to the publishers. |
Keywords: | Benchmarking; Distribution transformers; Gesture recognition; Modal analysis; Problem oriented languages; Speech recognition; Unified Modeling Language, Down-stream; Emotion recognition; Foundation models; Multi-modal; Multi-modal emotion recognition; Multiple modalities; Pre-trained model; Sentiment analysis; Switch-on; Transformer modeling, Emotion Recognition |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 06 Dec 2024 17:09 |
Last Modified: | 06 Dec 2024 17:09 |
URI: | http://eprints.iisc.ac.in/id/eprint/87106 |
Actions (login required)
View Item |