Two-timescale Q-learning Algorithms with an Application to Routing in Networks

Mohan Babu, K and Bhatnagar, Shalabh (2007) Two-timescale Q-learning Algorithms with an Application to Routing in Networks. In: International Conference on Advances in Control and Optimization of Dynamical Systems, ACODS- Bangalore, Feb. 2007, Bangalore.

Preview

PDF
10.1.1.130.7691.pdf - Published Version
Download (2MB)

Official URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=1...

Abstract

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current ’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown.

Item Type:	Conference Paper
Keywords:	Q-learning based algorithms;Markov decision processes;two- timescale stochastic approximation;simultaneous perturbation stochastic approximation (SPSA);normalized Hadamard matrices
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	17 Oct 2011 05:22
Last Modified:	17 Oct 2011 05:22
URI:	http://eprints.iisc.ac.in/id/eprint/41467

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India