Dominant strategy truthful, deterministic multi-armed bandit mechanisms with logarithmic regret for sponsored search auctions

Padmanabhan, D and Bhat, S and Prabuchandran, KJ and Shevade, S and Narahari, Y (2022) Dominant strategy truthful, deterministic multi-armed bandit mechanisms with logarithmic regret for sponsored search auctions. In: Applied Intelligence, 52 (3). pp. 3209-3226.

PDF
app_int_52-3_3209-3226_2022.pdf - Published Version
Restricted to Registered users only
Download (2MB)

Official URL: https://doi.org/10.1007/s10489-021-02387-2

Abstract

Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of Ω(T2/3), where T is the number of time steps. This happens because the existing mechanisms consider the worst case scenario where the means of the agents’ stochastic rewards are separated by a very small amount that depends on T. We make, and, exploit the crucial observation that in most scenarios, the separation between the agents’ rewards is rarely a function of T. Moreover, in the case that the rewards of the arms are arbitrarily close, the regret contributed by such sub-optimal arms is minimal. Our idea is to allow the center to indicate the resolution, Δ, with which the agents must be distinguished. This immediately leads us to introduce the notion of Δ-Regret. Using sponsored search auctions as a concrete example (the same idea applies for other applications as well), we propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. Remarkably, the proposed mechanism Δ-UCB achieves a Δ-regret of O(logT) for the case of sponsored search auctions. We first establish the results for single slot sponsored search auctions and then non-trivially extend the results to the case where multiple slots are to be allocated.

Item Type:	Journal Article
Publication:	Applied Intelligence
Publisher:	Springer
Additional Information:	The copyright for this article belongs to the Authors.
Keywords:	Commerce, Dominant strategy; Incentive compatible; Multi armed bandit; Online procurement; Sponsored search auctions; Time step; Upper confidence bound; Worst case scenario, Stochastic systems
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	27 Jun 2022 05:43
Last Modified:	27 Jun 2022 05:43
URI:	https://eprints.iisc.ac.in/id/eprint/73894

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India