Bhatnagar, Shalabh and Lakshmanan, K (2016) Multiscale Q-learning with linear function approximation. In: DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 26 (3). pp. 477-509.
PDF
Dis_Eve_Dyn_sys_26-3_477_2016.pdf - Published Version Restricted to Registered users only Download (1MB) | Request a copy |
Abstract
We present in this article a two-timescale variant of Q-learning with linear function approximation. Both Q-values and policies are assumed to be parameterized with the policy parameter updated on a faster timescale as compared to the Q-value parameter. This timescale separation is seen to result in significantly improved numerical performance of the proposed algorithm over Q-learning. We show that the proposed algorithm converges almost surely to a closed connected internally chain transitive invariant set of an associated differential inclusion.
Item Type: | Journal Article |
---|---|
Publication: | DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS |
Publisher: | SPRINGER, VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS |
Additional Information: | Copy right for this article belongs to the SPRINGER, VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 25 Aug 2016 05:02 |
Last Modified: | 25 Aug 2016 05:02 |
URI: | http://eprints.iisc.ac.in/id/eprint/54360 |
Actions (login required)
View Item |