J. Audibert, S. Bubeck, and R. Munos, ”Best arm identification in multi-armed bandits,” The 23rd Conference on Learning Theory, pp.41-53, 2010.
E. Even-Dar, S. Mannor, and Y. Mansour, ”Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems,” Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.