ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Negative Sampling for Hyperlink Prediction in Networks

Patil, P and Sharma, G and Murty, MN (2020) Negative Sampling for Hyperlink Prediction in Networks. In: 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020, 11-14 May 2020, Singapore; Singapore, pp. 607-619.

[img]
Preview
PDF
LEC_NOT_COM_SCI_12085_LNAI_607-619_2020.pdf - Published Version

Download (1MB) | Preview
Official URL: https://dx.doi.org/10.1007/978-3-030-47436-2_46

Abstract

While graphs capture pairwise relations between entities, hypergraphs deal with higher-order ones, thereby ensuring losslessness. However, in hyperlink (i.e., higher-order link) prediction, where hyperlinks and non-hyperlinks are treated as �positive� and �negative� classes respectively, hypergraphs suffer from the problem of extreme class imbalance. Given this context, �negative sampling��under-sampling the negative class of non-hyperlinks�becomes mandatory for performing hyperlink prediction. No prior work on hyperlink prediction deals with this problem. In this work, which is the first of its kind, we deal with this problem in the context of hyperlink prediction. More specifically, we leverage graph sampling techniques for sampling non-hyperlinks in hyperlink prediction. Our analysis clearly establishes the effect of random sampling, which is the norm in both link- as well as hyperlink-prediction. Further, we formalize the notion of �hardness� of non-hyperlinks via a measure of density, and analyze its distribution over various negative sampling techniques. We experiment with some real-world hypergraph datasets and provide both qualitative and quantitative results on the effects of negative sampling. We also establish its importance in evaluating hyperlink prediction algorithms. © Springer Nature Switzerland AG 2020.

Item Type: Conference Paper
Publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher: Springer
Additional Information: cited By 0; Conference of 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020 ; Conference Date: 11 May 2020 Through 14 May 2020; Conference Code:240129
Keywords: Data mining; Forecasting; Graph theory, Class imbalance; Graph samplings; Higher-order; Prediction algorithms; Quantitative result; Random sampling; Sampling technique; Under-sampling, Hypertext systems
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 14 Dec 2020 11:41
Last Modified: 14 Dec 2020 11:41
URI: http://eprints.iisc.ac.in/id/eprint/65873

Actions (login required)

View Item View Item