Deb, R and Gandhi, M and Bhatnagar, S (2022) Schedule Based Temporal Difference Algorithms. In: 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022, 27 - 30 September 2022, Monticello.
|
PDF
ALLERTON_2022.pdf - Published Version Download (391kB) | Preview |
Abstract
Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD (lambda) is a popular class of algorithms to solve this problem. However, the weights assigned to different n -step returns in TD (lambda), controlled by the parameter lambda, decrease exponentially with increasing n. In this paper, we present a lambda -schedule procedure that generalizes the TD (lambda) algorithm to the case when the parameter lambda could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different n -step returns by choosing a sequence lambdattgeq 1. Based on this procedure, we propose an on-policy algorithm - TD (lambda)text- schedule, and two off-policy algorithms - GTD (lambda) -schedule and TDC (lambda) -schedule, respectively. We provide proofs of almost sure convergence for all three algorithms under a general Markov noise framework. © 2022 IEEE.
Item Type: | Conference Paper |
---|---|
Publication: | 2022 58th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2022 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Additional Information: | The copyright for this article belongs to Author(S). |
Keywords: | Parameter estimation, Almost sure convergence; Data sample; Lambda's; Markov noise; Reinforcement learnings; Temporal-difference algorithm; Time step; Value functions; Weight assignment, Reinforcement learning |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 09 Jan 2023 09:08 |
Last Modified: | 09 Jan 2023 09:08 |
URI: | https://eprints.iisc.ac.in/id/eprint/78937 |
Actions (login required)
View Item |