ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Efficient classification using phrases generated by topic models

Gujraniya, Deepak and Murty, Narsimha M (2012) Efficient classification using phrases generated by topic models. In: 21st International Conference on Pattern Recognition (ICPR 2012), 11-15 Nov. 2012, Tsukuba, Japan.

[img] PDF
Int_Con_Pat_Rec_1051_2013.pdf - Published Version
Restricted to Registered users only

Download (224kB) | Request a copy
Official URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb...


There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.

Item Type: Conference Paper
Publisher: IEEE
Additional Information: Copyright of this article belongs to IEEE.
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 02 Jul 2013 08:34
Last Modified: 02 Jul 2013 08:34
URI: http://eprints.iisc.ac.in/id/eprint/46624

Actions (login required)

View Item View Item