Joseph, Ajin George and Bhatnagar, Shalabh (2017) Bounds for Off-policy Prediction in Reinforcement Learning. In: International Joint Conference on Neural Networks (IJCNN), MAY 14-19, 2017, Anchorage, AK, pp. 3991-3997.
PDF
Int_Joi_Con_Net_3991_2017.pdf - Published Version Restricted to Registered users only Download (174kB) | Request a copy |
Abstract
In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.
Item Type: | Conference Proceedings |
---|---|
Series.: | IEEE International Joint Conference on Neural Networks (IJCNN) |
Publisher: | IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA |
Additional Information: | Copy right for this article belong to IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA |
Department/Centre: | Division of Interdisciplinary Sciences > Supercomputer Education & Research Centre |
Date Deposited: | 13 Apr 2018 19:56 |
Last Modified: | 23 Oct 2018 14:48 |
URI: | http://eprints.iisc.ac.in/id/eprint/59554 |
Actions (login required)
View Item |