Bhatnagar, Shalabh and Babu, Mohan K (2008) New algorithms of the Q-learning type. In: Automatica, 44 (4). pp. 1111-1119.
PDF
0.pdf - Published Version Restricted to Registered users only Download (290kB) | Request a copy |
Official URL: http://www.sciencedirect.com/science?_ob=ArticleUR...
Abstract
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.
Item Type: | Journal Article |
---|---|
Publication: | Automatica |
Publisher: | Elsevier Science |
Additional Information: | Copyright of this article belongs to Elsevier Science. |
Keywords: | Q-learning;Reinforcement learning;Markov decision processes;Two-timescale stochastic approximation;SPSA. |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 25 Mar 2010 11:19 |
Last Modified: | 21 Feb 2019 11:30 |
URI: | http://eprints.iisc.ac.in/id/eprint/26525 |
Actions (login required)
View Item |