ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset

Selvaraj, Sathiya Keerthi and Bhar, Bigyan and Sellamanickam, Sundararajan and Shevade, Shirish (2011) Semi-supervised SVMs for classification with unknown class proportions and a small labeled dataset. In: Proceedings of the 20th ACM international Conference on Information and Knowledge Management, 2011, New York, NY, USA.

[img] PDF
In_Know_Man_653_2011.pdf - Published Version
Restricted to Registered users only

Download (777kB) | Request a copy
Official URL: http://dx.doi.org/10.1145/2063576.2063674

Abstract

In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.

Item Type: Conference Paper
Publisher: Association for Computing Machinery
Additional Information: Copyright of this article belongs to Association for Computing Machinery.
Keywords: Transductive and Semi-Supervised Learning;Classification;Support Vector Machines
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 19 Mar 2013 09:03
Last Modified: 19 Mar 2013 09:03
URI: http://eprints.iisc.ac.in/id/eprint/46034

Actions (login required)

View Item View Item