Bhatnagar, Shalabh (2010) An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. In: Systems & Control Letters, 59 (12). pp. 760-766.
PDF
An_actor.pdf - Published Version Restricted to Registered users only Download (274kB) | Request a copy |
Abstract
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Item Type: | Journal Article |
---|---|
Publication: | Systems & Control Letters |
Publisher: | Elsevier Science B.V. |
Additional Information: | Copyright of this article belongs to Elsevier Science B.V. |
Keywords: | Constrained Markov decision processes; Infinite horizon discounted cost criterion; Function approximation; Actor-critic algorithm; Simultaneous perturbation stochastic approximation |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 30 Mar 2011 07:33 |
Last Modified: | 30 Mar 2011 07:33 |
URI: | http://eprints.iisc.ac.in/id/eprint/36331 |
Actions (login required)
View Item |