Physics-Driven Machine Learning for Time-Optimal Path Planning in Stochastic Dynamic Flows

Chowdhury, R and Subramani, DN (2020) Physics-Driven Machine Learning for Time-Optimal Path Planning in Stochastic Dynamic Flows. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2- 4 October 2020, Boston, pp. 293-301.

PDF
DDDAS_2020.pdf - Published Version
Restricted to Registered users only
Download (1MB) | Request a copy

Official URL: https://doi.org/10.1007/978-3-030-61725-7_34

Abstract

Optimal path planning of autonomous marine agents is important to minimize operational costs of ocean observation systems. Within the context of DDDAS, we present a Reinforcement Learning (RL) framework for computing a dynamically adaptable policy that minimizes expected travel time of autonomous vehicles between two points in stochastic dynamic flows. To forecast the stochastic dynamic environment, we utilize the reduced order data-driven dynamically orthogonal (DO) equations. For planning, a novel physics-driven online Q-learning is developed. First, the distribution of exact time optimal paths predicted by stochastic DO Hamilton-Jacobi level set partial differential equations are utilized to initialize the action value function (Q-value) in a transfer learning approach. Next, the flow data collected by onboard sensors are utilized in a feedback loop to adaptively refine the optimal policy. For the adaptation, a simple Bayesian estimate of the environment is performed (the DDDAS data assimilation loop) and the inferred environment is used to update the Q-values in an greedy exploration approach (the RL step). To validate our Q-learning solution, we compare it with a fully offline, dynamic programming solution of the Markov Decision Problem corresponding to the RL framework. For this, novel numerical schemes to efficiently utilize the DO forecasts are derived and computationally efficient GPU-implementation is completed. We showcase the new RL algorithm and elucidate its computational advantages by planning paths in a stochastic quasi-geostrophic double gyre circulation. Â© 2020, Springer Nature Switzerland AG.

Item Type:	Conference Paper
Publication:	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher:	Springer Science and Business Media Deutschland GmbH
Additional Information:	The copyright for this article belongs to Springer Science and Business Media Deutschland GmbH.
Keywords:	Bayesian networks; Data reduction; Dynamics; Reinforcement learning; Stochastic systems; Transfer learning; Travel time, Computational advantages; Computationally efficient; Markov decision problem; Ocean observation systems; Optimal path planning; Programming solutions; Stochastic dynamics; Time-optimal path planning, Dynamic programming
Department/Centre:	Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited:	07 Feb 2023 09:24
Last Modified:	07 Feb 2023 09:24
URI:	https://eprints.iisc.ac.in/id/eprint/79995

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India