ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Bounds for Off-policy Prediction in Reinforcement Learning

Joseph, Ajin George and Bhatnagar, Shalabh (2017) Bounds for Off-policy Prediction in Reinforcement Learning. In: International Joint Conference on Neural Networks (IJCNN), MAY 14-19, 2017, Anchorage, AK, pp. 3991-3997.

[img] PDF
Int_Joi_Con_Net_3991_2017.pdf - Published Version
Restricted to Registered users only

Download (174kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/IJCNN.2017.7966359

Abstract

In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.

Item Type: Conference Proceedings
Additional Information: Copy right for this article belong to IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Department/Centre: Division of Interdisciplinary Research > Supercomputer Education & Research Centre
Depositing User: Id for Latest eprints
Date Deposited: 13 Apr 2018 19:56
Last Modified: 23 Oct 2018 14:48
URI: http://eprints.iisc.ac.in/id/eprint/59554

Actions (login required)

View Item View Item