Schedule Based Temporal Difference Algorithms

Deb, R and Gandhi, M and Bhatnagar, S (2022) Schedule Based Temporal Difference Algorithms. In: 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022, 27 - 30 September 2022, Monticello.

Preview

PDF
ALLERTON_2022.pdf - Published Version
Download (391kB) | Preview

Official URL: https://doi.org/10.1109/Allerton49937.2022.9929388

Abstract

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD (lambda) is a popular class of algorithms to solve this problem. However, the weights assigned to different n -step returns in TD (lambda), controlled by the parameter lambda, decrease exponentially with increasing n. In this paper, we present a lambda -schedule procedure that generalizes the TD (lambda) algorithm to the case when the parameter lambda could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different n -step returns by choosing a sequence lambdattgeq 1. Based on this procedure, we propose an on-policy algorithm - TD (lambda)text- schedule, and two off-policy algorithms - GTD (lambda) -schedule and TDC (lambda) -schedule, respectively. We provide proofs of almost sure convergence for all three algorithms under a general Markov noise framework. Â© 2022 IEEE.

Item Type:	Conference Paper
Publication:	2022 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Additional Information:	The copyright for this article belongs to Author(S).
Keywords:	Parameter estimation, Almost sure convergence; Data sample; Lambda's; Markov noise; Reinforcement learnings; Temporal-difference algorithm; Time step; Value functions; Weight assignment, Reinforcement learning
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	09 Jan 2023 09:08
Last Modified:	09 Jan 2023 09:08
URI:	https://eprints.iisc.ac.in/id/eprint/78937

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India