Ballooning Multi-Armed Bandits

Ghalme, G and Dhamal, S and Jain, S and Gujar, S and Narahari, Y (2021) Ballooning Multi-Armed Bandits. In: ALA 2020 - Adaptive and Learning Agents Workshop at AAMAS 2020 .

Preview

PDF
art_int_296_2021.pdf - Published Version
Download (1MB) | Preview

Official URL: https://doi.org/10.1016/j.artint.2021.103485

Abstract

In this paper, we introduce ballooning multi-armed bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. In contrast to the classical MAB setting where the regret is computed with respect to the best arm overall, the regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing stochastic MAB algorithms are not regret-optimal for the BL-MAB model. We show that if the best arm is equally likely to arrive at any time, a sublinear regret cannot be achieved, irrespective of the arrival of other arms. We further show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Our proposed algorithm determines (1) the fraction of the time horizon for which the newly arriving arms should be explored and (2) the sequence of arm pulls in the exploitation phase from among the explored arms. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distributionâ��s tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify explicit dependence of regret on the arrival distribution parameters. We reinforce our theoretical findings with extensive simulation results. Â© 2020 ALA 2020 - Adaptive and Learning Agents Workshop at AAMAS 2020. All rights reserved.

Item Type:	Journal Article
Publication:	ALA 2020 - Adaptive and Learning Agents Workshop at AAMAS 2020
Publisher:	Elsevier
Additional Information:	The copyright for this article belongs to Elsevier.
Keywords:	Intelligent agents; Stochastic systems, Distribution parameters; Explicit dependences; Extensive simulations; Multiarmed bandits (MABs); Pull-in; Stochastics; Sublinear; Time horizons, Stochastic models
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	28 Feb 2024 09:48
Last Modified:	28 Feb 2024 09:48
URI:	https://eprints.iisc.ac.in/id/eprint/83577

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India