ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Focussed Crawling with large scale Ordinal Regression Solvers

Babaria, Rashmin and Saketha Nath, J and Krishnan, S and Sivaramakrishnan, KR and Bhattacharyya, Chiranjib and Murty, MN (2007) Focussed Crawling with large scale Ordinal Regression Solvers. In: ICML '07 Proceedings of the 24th international conference on Machine learning , New York, NY.

[img] PDF
Focused_Cra.pdf - Published Version
Restricted to Registered users only

Download (262kB) | Request a copy
Official URL: http://dl.acm.org/citation.cfm?id=1273504

Abstract

In this paper we propose a novel, scalable, clustering based Ordinal Regression formulation, which is an instance of a Second Order Cone Program (SOCP) with one Second Order Cone (SOC) constraint. The main contribution of the paper is a fast algorithm, CB-OR, which solves the proposed formulation more eficiently than general purpose solvers. Another main contribution of the paper is to pose the problem of focused crawling as a large scale Ordinal Regression problem and solve using the proposed CB-OR. Focused crawling is an efficient mechanism for discovering resources of interest on the web. Posing the problem of focused crawling as an Ordinal Regression problem avoids the need for a negative class and topic hierarchy, which are the main drawbacks of the existing focused crawling methods. Experiments on large synthetic and benchmark datasets show the scalability of CB-OR. Experiments also show that the proposed focused crawler outperforms the state-of-the-art.

Item Type: Conference Paper
Publisher: ACM Press
Additional Information: Copyright of this article belongs to ACM Press.
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 17 Oct 2011 09:56
Last Modified: 17 Oct 2011 09:56
URI: http://eprints.iisc.ac.in/id/eprint/41489

Actions (login required)

View Item View Item