Bhatnagar, Shalabh and Borkar, VS and Akarapu, Madhukar (2006) A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events. In: Journal of Machine Learning Research, 7 . pp. 1937-1962.
PDF
markov_chains.pdf Restricted to Registered users only Download (219kB) | Request a copy |
Abstract
We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest timescale, a gradient search algorithm for policy updates that is based on one-simulation simultaneous perturbation stochastic approximation (SPSA) type estimates is used. Deterministic perturbation sequences obtained from appropriate normalized Hadamard matrices are used here. The fast timescale recursions compute the conditional transition probabilities of an associated chain by obtaining solutions to the multiplicative Poisson equation (for a given policy estimate). Further, the risk parameter associated with the value function for a given policy estimate is updated on a timescale that lies in between the two scales above. We briefly sketch the convergence analysis of our algorithm and present a numerical application in the setting of routing multiple lows in communication networks.
Item Type: | Journal Article |
---|---|
Publication: | Journal of Machine Learning Research |
Publisher: | Journal of Machine Learning Research |
Additional Information: | Copyright of this article belongs to Journal of Machine Learning Research. |
Keywords: | Markov decision processes;optimal control conditioned on a rare event;simulation based algorithms;SPSA with deterministic perturbations;reinforcement learning |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 30 May 2008 |
Last Modified: | 27 Feb 2019 10:20 |
URI: | http://eprints.iisc.ac.in/id/eprint/14129 |
Actions (login required)
View Item |