Analysis of Thompson sampling for stochastic sleeping bandits

Chaterjee, A and Ghalme, G and Jain, S and Vaish, R and Narahari, Y (2017) Analysis of Thompson sampling for stochastic sleeping bandits. In: 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, 11 - 15 August 2017, Sydney.

PDF
UAI 2017_2017.pdf - Published Version
Restricted to Registered users only
Download (523kB) | Request a copy

Official URL: https://www.semanticscholar.org/paper/Analysis-of-...

Abstract

We study a variant of the stochastic multiarmed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is an O(log T) regret bound for Thompson Sampling, which generalizes a similar bound known for this algorithm from the classical bandit setting. Our bound also matches (up to constants) the best-known lower bound for the sleeping bandit problem. We show via simulations that Thompson Sampling outperforms the UCB-style AUER algorithm for the sleeping bandit problem.

Item Type:	Conference Paper
Publication:	Uncertainty in Artificial Intelligence - Proceedings of the 33rd Conference, UAI 2017
Publisher:	AUAI Press Corvallis
Additional Information:	The copyright for this article belongs to the AUAI Press Corvallis.
Keywords:	Artificial intelligence; Probability; Stochastic systems, Bandit problems; Lower bounds; Multi-armed bandit problem; Regret bounds; Thompson samplings, Sleep research
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	22 Jul 2022 11:18
Last Modified:	22 Jul 2022 11:18
URI:	https://eprints.iisc.ac.in/id/eprint/74737

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India