WILLIAMS, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning. 1992, 8, 229-256
SUTTON, R. S. Policy gradient method for reinforcement learning with function approximation. Advances in Neural Information Processing Systems. 2000, 12, 1057-1063