Rahul, NR and Katewa, V (2024) Transfer in Sequential Multi-Armed Bandits via Reward Samples. In: 2024 European Control Conference, ECC 2024, 25 June 2024through 28 June 2024, Stockholm, pp. 2083-2089.
|
PDF
eur_con_con_ecc_2024.pdf - Published Version Download (1MB) | Preview |
Abstract
We consider a sequential stochastic multi-armed bandit problem where the agent interacts with the bandit over multiple episodes. The reward distribution of the arms remains constant throughout an episode but can change over different episodes. We propose an algorithm based on UCB to transfer the reward samples from the previous episodes and improve the cumulative regret performance over all the episodes. We provide regret analysis and empirical results for our algorithm, which show significant improvement over the standard UCB algorithm without transfer. © 2024 EUCA.
Item Type: | Conference Paper |
---|---|
Publication: | 2024 European Control Conference, ECC 2024 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Authors. |
Keywords: | Multiarmed bandit problems (MABP); Multiarmed bandits (MABs); Performance; Stochastics, Stochastic systems |
Department/Centre: | Division of Electrical Sciences > Electrical Communication Engineering Division of Interdisciplinary Sciences > Robert Bosch Centre for Cyber Physical Systems |
Date Deposited: | 06 Sep 2024 10:18 |
Last Modified: | 06 Sep 2024 10:18 |
URI: | http://eprints.iisc.ac.in/id/eprint/86030 |
Actions (login required)
View Item |