Improved Pure Exploration in Linear Bandits with No-Regret Learning

Zaki, M and Mohan, A and Gopalan, A (2022) Improved Pure Exploration in Linear Bandits with No-Regret Learning. In: 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, 23 - 2022, Vienna, pp. 3709-3715.

PDF
IJCAI_2022.pdf - Published Version
Restricted to Registered users only
Download (290kB) | Request a copy

Official URL: https://doi.org/10.24963/ijcai.2022/515

Abstract

We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence setting; this is also the problem of optimizing an unknown linear function over a discrete domain with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm for best arm identification. Existing approaches that achieve optimal sample complexity assume either access to a nontrivial minimax optimization oracle (e.g. RAGE and Lazy-TS) or knowledge of an upperbound on the norm of the unknown parameter vector(e.g. LinGame). Our algorithm, which we call the Phased Elimination Linear Exploration Game (PELEG), maintains a high-probability confidence ellipsoid containing the unknown vector, and uses it to eliminate suboptimal arms in phases, where a minimax problem is essentially solved in each phase using two-player low regret learners. PELEG does not require to know a bound on norm of the unknown vector, and is asymptotically-optimal, settling an open problem. We show that the sample complexity of PELEG matches, up to order and in a non-asymptotic sense, an instance-dependent universal lower bound for linear bandits. PELEG is thus the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees. Â© 2022 International Joint Conferences on Artificial Intelligence. All rights reserved.

Item Type:	Conference Paper
Publication:	IJCAI International Joint Conference on Artificial Intelligence
Publisher:	International Joint Conferences on Artificial Intelligence
Additional Information:	The copyright for this article belongs to the International Joint Conferences on Artificial Intelligence.
Keywords:	Artificial intelligence; Computational complexity, Complexity algorithms; Discrete domains; Identification problem; Linear functions; Minimax optimization; Multiarmed bandits (MABs); No-regret learning; Optimal samples; Sample complexity; Upper Bound, Optimization
Department/Centre:	Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited:	06 Oct 2022 11:00
Last Modified:	06 Oct 2022 11:00
URI:	https://eprints.iisc.ac.in/id/eprint/77250

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India