Achieving fairness in the stochastic multi-armed bandit problem

Patil, V and Ghalme, G and Nair, V and Narahari, Y (2020) Achieving fairness in the stochastic multi-armed bandit problem. In: 34th AAAI Conference on Artificial Intelligence, AAAI 2020, 7 February - 12 February 2020, pp. 5379-5386.

PDF
AAAI_2020.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy

Official URL: https://doi.org/10.48550/arXiv.1907.10516

Abstract

We study an interesting variant of the stochastic multi-armed bandit problem, which we call the FAIR-MAB problem, where, in addition to the objective of maximizing the sum of expected rewards, the algorithm also needs to ensure that at any time, each arm is pulled at least a pre-specified fraction of times. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, which we call r-Regret, that takes into account the above fairness constraints and extends the conventional notion of regret in a natural way. Our primary contribution is to obtain a complete characterization of a class of FAIR-MAB algorithms via two parameters: the unfairness tolerance and the learning algorithm used as a black-box. For this class of algorithms, we provide a fairness guarantee that holds uniformly over time, irrespective of the choice of the learning algorithm. Further, when the learning algorithm is UCB1, we show that our algorithm achieves constant r-Regret for a large enough time horizon. Finally, we analyze the cost of fairness in terms of the conventional notion of regret. We conclude by experimentally validating our theoretical results.

Item Type:	Conference Paper
Publication:	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
Publisher:	AAAI press
Additional Information:	The copyright for this article belongs to AAAI press.
Keywords:	Artificial intelligence; Probability; Statistics; Stochastic systems, Black boxes; Fairness constraints; Fairness guarantee; Multi-armed bandit problem; Primary contribution; Time horizons; Two parameter, Learning algorithms
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	07 Feb 2023 05:19
Last Modified:	07 Feb 2023 05:19
URI:	https://eprints.iisc.ac.in/id/eprint/79985

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India