ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Active learning in keyword search-based data integration

Yan, Zhepeng and Zheng, Nan and Ives, Zachary G and Talukdar, Partha Pratim and Yu, Cong (2015) Active learning in keyword search-based data integration. In: VLDB JOURNAL, 24 (5, SI). pp. 611-631.

[img] PDF
VLDB_Jou_24-5_611_2015.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: http://dx.doi.org/10.1007/s00778-014-0374-x

Abstract

The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.

Item Type: Journal Article
Publication: VLDB JOURNAL
Publisher: SPRINGER
Additional Information: Copy right for this article belongs to the SPRINGER, 233 SPRING ST, NEW YORK, NY 10013 USA
Keywords: Data integration; Keyword search; Active learning
Department/Centre: Division of Interdisciplinary Sciences > Supercomputer Education & Research Centre
Date Deposited: 15 Oct 2015 06:10
Last Modified: 15 Oct 2015 06:10
URI: http://eprints.iisc.ac.in/id/eprint/52549

Actions (login required)

View Item View Item