ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Improved Pure Exploration in Linear Bandits with No-Regret Learning

Zaki, M and Mohan, A and Gopalan, A (2022) Improved Pure Exploration in Linear Bandits with No-Regret Learning. In: 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, 23 - 2022, Vienna, pp. 3709-3715.

[img] PDF
IJCAI_2022.pdf - Published Version
Restricted to Registered users only

Download (290kB) | Request a copy
Official URL: https://doi.org/10.24963/ijcai.2022/515

Abstract

We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence setting; this is also the problem of optimizing an unknown linear function over a discrete domain with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm for best arm identification. Existing approaches that achieve optimal sample complexity assume either access to a nontrivial minimax optimization oracle (e.g. RAGE and Lazy-TS) or knowledge of an upperbound on the norm of the unknown parameter vector(e.g. LinGame). Our algorithm, which we call the Phased Elimination Linear Exploration Game (PELEG), maintains a high-probability confidence ellipsoid containing the unknown vector, and uses it to eliminate suboptimal arms in phases, where a minimax problem is essentially solved in each phase using two-player low regret learners. PELEG does not require to know a bound on norm of the unknown vector, and is asymptotically-optimal, settling an open problem. We show that the sample complexity of PELEG matches, up to order and in a non-asymptotic sense, an instance-dependent universal lower bound for linear bandits. PELEG is thus the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees. © 2022 International Joint Conferences on Artificial Intelligence. All rights reserved.

Item Type: Conference Paper
Publication: IJCAI International Joint Conference on Artificial Intelligence
Publisher: International Joint Conferences on Artificial Intelligence
Additional Information: The copyright for this article belongs to the International Joint Conferences on Artificial Intelligence.
Keywords: Artificial intelligence; Computational complexity, Complexity algorithms; Discrete domains; Identification problem; Linear functions; Minimax optimization; Multiarmed bandits (MABs); No-regret learning; Optimal samples; Sample complexity; Upper Bound, Optimization
Department/Centre: Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited: 06 Oct 2022 11:00
Last Modified: 06 Oct 2022 11:00
URI: https://eprints.iisc.ac.in/id/eprint/77250

Actions (login required)

View Item View Item