Zaki, M and Mohan, A and Gopalan, A (2022) Improved Pure Exploration in Linear Bandits with No-Regret Learning. In: 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, 23 - 2022, Vienna, pp. 3709-3715.
PDF
IJCAI_2022.pdf - Published Version Restricted to Registered users only Download (290kB) | Request a copy |
Abstract
We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence setting; this is also the problem of optimizing an unknown linear function over a discrete domain with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm for best arm identification. Existing approaches that achieve optimal sample complexity assume either access to a nontrivial minimax optimization oracle (e.g. RAGE and Lazy-TS) or knowledge of an upperbound on the norm of the unknown parameter vector(e.g. LinGame). Our algorithm, which we call the Phased Elimination Linear Exploration Game (PELEG), maintains a high-probability confidence ellipsoid containing the unknown vector, and uses it to eliminate suboptimal arms in phases, where a minimax problem is essentially solved in each phase using two-player low regret learners. PELEG does not require to know a bound on norm of the unknown vector, and is asymptotically-optimal, settling an open problem. We show that the sample complexity of PELEG matches, up to order and in a non-asymptotic sense, an instance-dependent universal lower bound for linear bandits. PELEG is thus the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees. © 2022 International Joint Conferences on Artificial Intelligence. All rights reserved.
Item Type: | Conference Paper |
---|---|
Publication: | IJCAI International Joint Conference on Artificial Intelligence |
Publisher: | International Joint Conferences on Artificial Intelligence |
Additional Information: | The copyright for this article belongs to the International Joint Conferences on Artificial Intelligence. |
Keywords: | Artificial intelligence; Computational complexity, Complexity algorithms; Discrete domains; Identification problem; Linear functions; Minimax optimization; Multiarmed bandits (MABs); No-regret learning; Optimal samples; Sample complexity; Upper Bound, Optimization |
Department/Centre: | Division of Electrical Sciences > Electrical Communication Engineering |
Date Deposited: | 06 Oct 2022 11:00 |
Last Modified: | 06 Oct 2022 11:00 |
URI: | https://eprints.iisc.ac.in/id/eprint/77250 |
Actions (login required)
View Item |