ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

Bhatnagar, Shalabh and Borkar, VS and Akarapu, Madhukar (2006) A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events. In: Journal of Machine Learning Research, 7 . pp. 1937-1962.

[img] PDF
markov_chains.pdf
Restricted to Registered users only

Download (219kB) | Request a copy

Abstract

We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest timescale, a gradient search algorithm for policy updates that is based on one-simulation simultaneous perturbation stochastic approximation (SPSA) type estimates is used. Deterministic perturbation sequences obtained from appropriate normalized Hadamard matrices are used here. The fast timescale recursions compute the conditional transition probabilities of an associated chain by obtaining solutions to the multiplicative Poisson equation (for a given policy estimate). Further, the risk parameter associated with the value function for a given policy estimate is updated on a timescale that lies in between the two scales above. We briefly sketch the convergence analysis of our algorithm and present a numerical application in the setting of routing multiple lows in communication networks.

Item Type: Journal Article
Publication: Journal of Machine Learning Research
Publisher: Journal of Machine Learning Research
Additional Information: Copyright of this article belongs to Journal of Machine Learning Research.
Keywords: Markov decision processes;optimal control conditioned on a rare event;simulation based algorithms;SPSA with deterministic perturbations;reinforcement learning
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 30 May 2008
Last Modified: 27 Feb 2019 10:20
URI: http://eprints.iisc.ac.in/id/eprint/14129

Actions (login required)

View Item View Item