Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

Banerjee, D and Ghosh, A and Chowdhury, SR and Gopalan, A (2023) Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference. In: 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, 25 - 27 April 2023, Valencia, pp. 8233-8262.

PDF
AISTATS_2023.pdf - Published Version
Restricted to Registered users only
Download (914kB) | Request a copy

Official URL: https://proceedings.mlr.press/v206/banerjee23b.htm...

Abstract

We present a non-asymptotic lower bound on the spectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as Ω(√n) whenever the expected cumulative regret of the algorithm is O(√n), where n is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound on the least eigenvalue, rather than a logarithmic lower bound as shown by Lattimore and Szepesvari (2017) for discrete (i.e., well-separated) action spaces. Furthermore, while the latter holds only in the asymptotic regime (n → ∞), our result for these “locally rich” action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios - model selection and clustering in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary-the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret. Copyright

Item Type:	Conference Paper
Publication:	Proceedings of Machine Learning Research
Publisher:	ML Research Press
Additional Information:	The copyright for this article belongs to the ML Research Press
Keywords:	Clustering algorithms; Learning algorithms; Matrix algebra; Multi agent systems, Action sets; Action spaces; Clusterings; Design matrix; Least eigenvalue; Low bound; Minimum eigenvalue; Model Selection; Non-asymptotic; Spectra's, Eigenvalues and eigenfunctions
Department/Centre:	Division of Electrical Sciences > Electrical Communication Engineering
Date Deposited:	01 Aug 2023 05:07
Last Modified:	01 Aug 2023 05:07
URI:	https://eprints.iisc.ac.in/id/eprint/82742

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India