Transfer in Sequential Multi-Armed Bandits via Reward Samples

Rahul, NR and Katewa, V (2024) Transfer in Sequential Multi-Armed Bandits via Reward Samples. In: 2024 European Control Conference, ECC 2024, 25 June 2024through 28 June 2024, Stockholm, pp. 2083-2089.

Preview

PDF
eur_con_con_ecc_2024.pdf - Published Version
Download (1MB) | Preview

Official URL: https://doi.org/10.23919/ECC64448.2024.10590903

Abstract

We consider a sequential stochastic multi-armed bandit problem where the agent interacts with the bandit over multiple episodes. The reward distribution of the arms remains constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over the standard UCB algorithm without transfer. Â© 2024 EUCA.

Item Type:	Conference Paper
Publication:	2024 European Control Conference, ECC 2024
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to Authors.
Keywords:	Multiarmed bandit problems (MABP); Multiarmed bandits (MABs); Performance; Stochastics, Stochastic systems
Department/Centre:	Division of Electrical Sciences > Electrical Communication Engineering Division of Interdisciplinary Sciences > Robert Bosch Centre for Cyber Physical Systems
Date Deposited:	06 Sep 2024 10:18
Last Modified:	06 Sep 2024 10:18
URI:	http://eprints.iisc.ac.in/id/eprint/86030

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India