Karmakar, Prasenjit and Bhatnagar, Shalabh (2018) Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning. In: MATHEMATICS OF OPERATIONS RESEARCH, 43 (1). pp. 130-151.