New algorithms of the Q-learning type

Bhatnagar, Shalabh and Babu, Mohan K (2008) New algorithms of the Q-learning type. In: Automatica, 44 (4). pp. 1111-1119.

PDF
0.pdf - Published Version
Restricted to Registered users only
Download (290kB) | Request a copy

Official URL: http://www.sciencedirect.com/science?_ob=ArticleUR...

Abstract

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

Item Type:	Journal Article
Publication:	Automatica
Publisher:	Elsevier Science
Additional Information:	Copyright of this article belongs to Elsevier Science.
Keywords:	Q-learning;Reinforcement learning;Markov decision processes;Two-timescale stochastic approximation;SPSA.
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	25 Mar 2010 11:19
Last Modified:	21 Feb 2019 11:30
URI:	http://eprints.iisc.ac.in/id/eprint/26525

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India