The Reinforce Policy Gradient Algorithm Revisited

Bhatnagar, S The Reinforce Policy Gradient Algorithm Revisited. In: UNSPECIFIED, p. 177.

PDF
9th_Ind_con_con_2023. pdf - Published Version
Download (203kB)

Abstract

We revisit the Reinforce policy gradient algorithm that works with full cost returns obtained over random length episodes. We propose a new Reinforce type algorithm that estimates the policy gradient using a function measurement over a perturbed parameter using a smoothed functional based gradient estimator. We observe that even though we estimate the gradient of the performance objective using sample performance (and not the sample gradient), the algorithm converges to a neighborhood of a local minimum. We further describe the main convergence result. Â© 2023 IEEE.

Item Type:	Conference Paper
Publication:	2023 9th Indian Control Conference, ICC 2023 - Proceedings
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	16 May 2024 08:48
Last Modified:	16 May 2024 08:48
URI:	https://eprints.iisc.ac.in/id/eprint/84513

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India