ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Generalized Coupled Dictionary Learning Approach With Applications to Cross-Modal Matching

Mandal, Devraj and Biswas, Soma (2016) Generalized Coupled Dictionary Learning Approach With Applications to Cross-Modal Matching. In: IEEE TRANSACTIONS ON IMAGE PROCESSING, 25 (8). pp. 3826-3837.

[img] PDF
IEEE_Tra_Ima_Pro_25-8_3826_2016.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: http://dx.doi.org/10.1109/TIP.2016.2577885


Coupled dictionary learning (CDL) has recently emerged as a powerful technique with wide variety of applications ranging from image synthesis to classification tasks. In this paper, we extend the existing CDL approaches in two aspects to make them more suitable for the task of cross-modal matching. Data coming from different modalities may or may not be paired. For example, for image-text retrieval problem, 100 images of a class are available as opposed to only 50 samples of text data for training. Current CDL approaches are not designed to handle such scenarios, where classes of data points in one modality correspond to classes of data points in the other modality. Given the data from the two modalities, first two dictionaries are learnt for the respective modalities, so that the data have a sparse representation with respect to their own dictionaries. Then, the sparse coefficients from the two modalities are transformed in such a manner that data from the same class are maximally correlated, while that from different classes have very less correlation. This way of modeling the coupling between the sparse representations of the two modalities makes this approach work seamlessly for paired as well as unpaired data. The discriminative coupling term also makes the approach better suited for classification tasks. Experiments on different publicly available cross-modal data sets, namely, CUHK photosketch face data set, HFB visible and near-infrared facial images data set, IXMAS multiview action recognition data set, wiki image and text data set and Multiple Features data set, show that this generalized CDL approach performs better than the state-of-the-art for both paired as well as unpaired data.

Item Type: Journal Article
Additional Information: Copy right for this article belongs to the IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 445 HOES LANE, PISCATAWAY, NJ 08855-4141 USA
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 30 Aug 2016 10:28
Last Modified: 30 Aug 2016 10:28
URI: http://eprints.iisc.ac.in/id/eprint/54616

Actions (login required)

View Item View Item