Gujraniya, Deepak and Murty, Narsimha M (2012) Efficient classification using phrases generated by topic models. In: 21st International Conference on Pattern Recognition (ICPR 2012), 11-15 Nov. 2012, Tsukuba, Japan.
PDF
Int_Con_Pat_Rec_1051_2013.pdf - Published Version Restricted to Registered users only Download (224kB) | Request a copy |
Abstract
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Item Type: | Conference Paper |
---|---|
Publisher: | IEEE |
Additional Information: | Copyright of this article belongs to IEEE. |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 02 Jul 2013 08:34 |
Last Modified: | 02 Jul 2013 08:34 |
URI: | http://eprints.iisc.ac.in/id/eprint/46624 |
Actions (login required)
View Item |