SATO, M. Reinforcement learning based on on-line EM algorithm. Advances in Neural Information Processing Systems. 1999, 11, 1052-1058
KONDA, V. R. Actor-Critic Algorithms. SIAM Journal on Control and Optimization. 2001
KONDA, V. R. Actor-Critic Algorithms. PhD Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. 2002