ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Thompson sampling based mechanisms for stochastic multi-Armed bandit problems

Ghalme, G and Gujar, S and Jain, S and Narahari, Y (2017) Thompson sampling based mechanisms for stochastic multi-Armed bandit problems. In: 16th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2017, 8 May 2017-12 May 2017, Sao paulo, pp. 87-95.

[img] PDF
AAMAS_1-87-95_2017.pdf - Published Version
Restricted to Registered users only

Download (623kB) | Request a copy
Official URL: https://doi.org/10.1016/j.engfailanal.2022.106442

Abstract

This paper explores Thompson sampling in the context of mechanism design for stochastic multi-Armed bandit (MAB) problems. The setting is that of an MAB problem where the reward distribution of each arm consists of a stochastic component as well as a strategic component. Many existing MAB mechanisms use upper confidence bound (UCB) based algorithms for learning the parameters of the reward distribution. The randomized nature of Thompson sampling introduces certain unique, non-Trivial challenges for mechanism design, which we address in this paper through a rigorous regret analysis. We first propose a MAB mechanism with deterministic payment rule, namely, TSM-D. We show that in TSM- D, the variance of agent utilities asymptotically approaches zero. However, the game theoretic properties satisfied by TSM-D (incentive compatibility and individual rationality with high probability) are rather weak. As our main contribution, we then propose the mechanism TSM-R, with randomized payment rule, and prove that TSM-R satisfies appropriate, adequate notions of incentive compatibility and individual rationality. For TSM-R, we also establish a theoretical upper bound on the variance in utilities of the agents. We further show, using simulations, that the variance in social welfare incurred by TSM-D or TSM-R is much lower when compared to that of existing UCB based mechanisms. We believe this paper establishes Thompson sampling as an attractive approach to be used in MAB mechanism design.

Item Type: Conference Paper
Publication: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Publisher: International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Additional Information: The copyright for this article belongs to International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Keywords: Game theory; Machine design; Multi agent systems; Stochastic systems, Incentive compatibility; Individual rationality; Mechanism design; Multi armed bandit; Multi-armed bandit problem; Stochastic component; Thompson samplings; Upper confidence bound, Autonomous agents
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 27 Jul 2022 06:21
Last Modified: 27 Jul 2022 06:21
URI: https://eprints.iisc.ac.in/id/eprint/74684

Actions (login required)

View Item View Item