ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Generative Maximum Entropy Learning for Multiclass Classification

Dukkipati, Ambedkar and Pandey, Gaurav and Ghoshdastidar, Debarghya and Koley, Paramita and Sriram, Satya DMV (2013) Generative Maximum Entropy Learning for Multiclass Classification. In: IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA . pp. 141-150.

[img] PDF
ieee_13th_int_con_dat_min_141_2013.pdf - Published Version
Restricted to Registered users only

Download (244kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/ICDM.2013.26

Abstract

Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys (J) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon (JS) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence (JS(GM)), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using J-divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.

Item Type: Journal Article
Publication: IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Series.: IEEE International Conference on Data Mining
Publisher: IEEE
Additional Information: copyright for this article belongs to IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Keywords: Maximum Entropy; Jefferys Divergence; Jensen-Shannon Divergence; Text categorization
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 26 May 2014 11:24
Last Modified: 26 May 2014 11:24
URI: http://eprints.iisc.ac.in/id/eprint/48966

Actions (login required)

View Item View Item