ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Solution of MDPS using simulation-based value iteration

Abdulla, Mohammed Shahid and Bhatnagar, Shalabh (2005) Solution of MDPS using simulation-based value iteration. In: 2nd International Conference on Artificial Intelligence Applications and Innovations, SEP 07-09, 2005, Beijing.

[img] PDF
solution.pdf - Published Version
Restricted to Registered users only

Download (593kB) | Request a copy
Official URL: http://www.springerlink.com/content/1731x322652813...

Abstract

This article proposes a three-timescale simulation based algorithm for solution of infinite horizon Markov Decision Processes (MDPs). We assume a finite state space and discounted cost criterion and adopt the value iteration approach. An approximation of the Dynamic Programming operator T is applied to the value function iterates. This 'approximate' operator is implemented using three timescales, the slowest of which updates the value function iterates. On the middle timescale we perform a gradient search over the feasible action set of each state using Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates, thus finding the minimizing action in T. On the fastest timescale, the 'critic' estimates, over which the gradient search is performed, are obtained. A sketch of convergence explaining the dynamics of the algorithm using associated ODEs is also presented. Numerical experiments on rate based flow control on a bottleneck node using a continuous-time queueing model are performed using the proposed algorithm. The results obtained are verified against classical value iteration where the feasible set is suitably discretized. Over such a discretized setting, a variant of the algorithm of [12] is compared and the proposed algorithm is found to converge faster.

Item Type: Conference Paper
Publisher: Springer
Additional Information: Copyright of this article article belongs to Springer.
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 04 Jun 2010 10:27
Last Modified: 22 Feb 2012 06:53
URI: http://eprints.iisc.ac.in/id/eprint/27537

Actions (login required)

View Item View Item