ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

On a class of restless multi-armed bandits with deterministic policies

Jhunjhunwala, PR and Moharir, S and Manjunath, D and Gopalan, A (2019) On a class of restless multi-armed bandits with deterministic policies. In: 12th International Conference on Signal Processing and Communications, SPCOM 2018, 16 - 19 July 2018, Bangalore, pp. 487-491.

[img] PDF
SPCOM_2018.pdf - Published Version
Restricted to Registered users only

Download (258kB) | Request a copy
Official URL: https://doi.org/10.1109/SPCOM.2018.8724432

Abstract

We describe and analyze a restless multi-armed bandit (RMAB) in which, in each time-slot, the instantaneous reward from the playing of an arm depends on the time since the arm was last played. This model is motivated by recommendation systems where the payoff from a recommendation on depends the recommendation history. For an RMAB with N arms, and known reward functions for each arm that have a finite support (akin to a maximum memory) of M steps, we characterize the optimal policy that maximizes the infinite horizon time-average of the reward. Specifically, using a weighted-graph representation of the system evolution, we show that a periodic policy is optimal. Further, we show that the optimal periodic policy can be obtained using an algorithm with polynomial time and space complexity. Some extensions to the basic model are also presented; several more are possible. RMABs with such large state spaces for the arms have not been previously considered.

Item Type: Conference Paper
Publication: SPCOM 2018 - 12th International Conference on Signal Processing and Communications
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to Institute of Electrical and Electronics Engineers Inc.
Keywords: Polynomial approximation, Finite supports; Infinite horizons; Multi armed bandit; Optimal policies; Polynomial-time; Restless multi-armed bandit; Reward function; System evolution, Signal processing
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 08 Aug 2022 05:06
Last Modified: 08 Aug 2022 05:06
URI: https://eprints.iisc.ac.in/id/eprint/75479

Actions (login required)

View Item View Item