An efficient reinforcement learning scheme for the confinement escape problem

Gurumurthy, V and Mohanty, N and Sundaram, S and Sundararajan, N (2024) An efficient reinforcement learning scheme for the confinement escape problem. In: Applied Soft Computing, 152 .

PDF
app_sof_com_152_2024.pdf - Published Version
Restricted to Registered users only
Download (2MB) | Request a copy

Official URL: https://doi.org/10.1016/j.asoc.2024.111248

Abstract

Crucial real-world problems in robotics like trajectory planning during convoy missions and autonomous rescue missions can be framed as a confinement escape problem (CEP) (a type of pursuit-evasion game). In a typical CEP, an evader attempts to escape a confinement region by sequentially making decisions to plan an escape while the region is patrolled by multiple smart pursuers. The evader has a limited sensing range and does not know the total number of pursuers and their pursuit strategies making it difficult to model the environment and obtain a generalizable escape strategy. In this paper, the CEP is formulated in a reinforcement learning (RL) framework to overcome the above difficulties. The state function is designed independent of the total number of pursuers and the shape of the confinement region thereby making the RL approach scalable. To handle training consistency issues in deep RL and convergence issues due to sparse rewards, a Scaffolding Reflection based Reinforcement Learning (SR2L) approach is presented in this paper where the SR2L employs an actorâ��critic method with a motion planner scaffold to accelerate its training speed. Performance evaluation of SR2L shows that it trains twice as fast compared to other existing state-of-the-art actorâ��critic RL methods. Performance results show that the convergence of SR2L is more consistent than the corresponding conventional actorâ��critic RL methods and interactive reinforcement learning methods. Monte-Carlo simulation results show that SR2L outperforms other conventional RL methods and the motion planner with at least 28 and 10 faster escape times respectively while having the lowest variance in escape times against different pursuit strategies. Ablation studies done by changing different environmental parameters clearly show the scalability and generalizability of the proposed SR2L approach. Â© 2024 Elsevier B.V.

Item Type:	Journal Article
Publication:	Applied Soft Computing
Publisher:	Elsevier Ltd
Additional Information:	The copyright for this article belongs to Publisher.
Keywords:	Deep learning; Intelligent systems; Learning systems; Monte Carlo methods; Robot programming; Scaffolds, Actor-critic reinforcement learning; Confinement escape problem; Deep reinforcement learning; Escape problem; Expert-assisted learning; Interactive Reinforcement Learning; Motion planners; Reinforcement learning method; Reinforcement learnings; Scaffolding reflection, Reinforcement learning
Department/Centre:	Division of Mechanical Sciences > Aerospace Engineering(Formerly Aeronautical Engineering)
Date Deposited:	01 Mar 2024 07:05
Last Modified:	01 Mar 2024 07:05
URI:	https://eprints.iisc.ac.in/id/eprint/83918

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India