Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks

Prashanth, LA and Chatterjee, Abhranil and Bhatnagar, Shalabh (2014) Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks. In: WIRELESS NETWORKS, 20 (8). pp. 2589-2604.

PDF
wir_net_20-8_2589_2014.pdf - Published Version
Restricted to Registered users only
Download (770kB) | Request a copy

Official URL: http://dx.doi.org/ 10.1007/s11276-014-0762-6

Abstract

In this paper, we consider an intrusion detection application for Wireless Sensor Networks. We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous stateaction spaces, in a manner similar to Fuemmeler and Veeravalli (IEEE Trans Signal Process 56(5), 2091-2101, 2008). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation. Feature-based representations and function approximation is necessary to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation architecture for the Q-values) is updated in an on-policy temporal difference algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model and this is useful in settings where the latter is not known. Our simulation results on a synthetic 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work.

Item Type:	Journal Article
Publication:	WIRELESS NETWORKS
Additional Information:	Copyright for this article belongs to the SPRINGER, VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS
Department/Centre:	Division of Electrical Sciences > Computer Science & Automation
Date Deposited:	14 Dec 2014 10:24
Last Modified:	14 Dec 2014 10:24
URI:	http://eprints.iisc.ac.in/id/eprint/50434

Actions (login required)

View Item


	Powered by EPrints		A service from The J.R.D. Tata Memorial Library Indian Institute of Science, Bengaluru-560012, India