ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Diddigi, RB and Jain, P and Prabuchandran, JK and Bhatnagar, S (2022) Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm. In: 2022 International Joint Conference on Neural Networks, IJCNN 2022, 18 - 23 July 2022, Padua.

2022 Int__IJCNN 2022_2022July_2022.pdf - Published Version

Download (1MB) | Preview
Official URL: https://doi.org/10.1109/IJCNN55064.2022.9892303


Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as 'off-policy control' in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is very hard in the 'off-policy' setting compared to the 'on-policy' setting where new data from the policy updates is typically utilized in learning. This work proposes an off-policy natural actor-critic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency. The existing natural gradient-based actor-critic algorithms with convergence guarantees require fixed features for approximating both policy and value functions. This often leads to sub-optimal learning in many RL applications. On the other hand, our proposed algorithm utilizes compatible features that enable one to use arbitrary neural networks to approximate the policy and the value function and yet guarantee convergence to a locally optimal policy. We illustrate the benefit of the proposed off-policy natural gradient algorithm by comparing it with the vanilla gradient actor-critic algorithm on benchmark RL tasks.

Item Type: Conference Paper
Publication: Proceedings of the International Joint Conference on Neural Networks
Publisher: Institute of Electrical and Electronics Engineers Inc.
Additional Information: The copyright for this article belongs to the Author(s).
Keywords: Actor critic; Actor-critic algorithm; Behavior policy; Natural actor-critic; Neural-networks; Off-policy control; Optimal policies; Policy control; Policy setting; Reinforcement learnings, Reinforcement learning
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 15 Nov 2022 06:45
Last Modified: 15 Nov 2022 06:45
URI: https://eprints.iisc.ac.in/id/eprint/77925

Actions (login required)

View Item View Item