Karmakar, Prasenjit and Bhatnagar, Shalabh (2018) Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning. In: MATHEMATICS OF OPERATIONS RESEARCH, 43 (1). pp. 130-151.
Full text not available from this repository. (Request a copy)Abstract
We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by ``controlled'' Markov noise. In particular, the faster and slower recursions have nonadditive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation, using our results.
Item Type: | Journal Article |
---|---|
Publication: | MATHEMATICS OF OPERATIONS RESEARCH |
Publisher: | INFORMS, 5521 RESEARCH PARK DR, SUITE 200, CATONSVILLE, MD 21228 USA |
Additional Information: | Copy right for the article belong to INFORMS, 5521 RESEARCH PARK DR, SUITE 200, CATONSVILLE, MD 21228 USA |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 22 Mar 2018 13:57 |
Last Modified: | 22 Mar 2018 13:57 |
URI: | http://eprints.iisc.ac.in/id/eprint/59253 |
Actions (login required)
View Item |