Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies

ISHIHARA SEIJI; IGARASHI HARUKAZU

Art

J-GLOBAL ID：201602212752649639 Reference number：16A0268997

Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies

方策に関する知識を分離した方策こう配法 -環境ダイナミクスと行動価値による方策表現-

Publisher site {{ this.onShowPLink() }} Copy service {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=16A0268997&COPY=1") }}
Access JDreamⅢ for advanced search and analysis. {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=16A0268997&from=J-GLOBAL&jstjournalNo=S0810A") }}

Author (2)： ,
Material：
Volume： 136 Issue： 3 Page： 282-289 (J-STAGE) Publication year： 2016
JST Material Number： S0810A ISSN： 0385-4221 Document type： Article
Article type：原著論文 Country of issue： Japan (JPN) Language： JAPANESE (JA)

, , , ,

Artificial intelligence

Reference (24)：

(1) R. S. Sutton and A. G. Barto: Reinforcement Learning, MIT Press, Cambridge (1998)
(2) R. J. Williams: “Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning”, Machine Learning, Vol. 8, pp. 229-256 (1992)
(3) H. Kimura, M. Yamamura, and S. Kobayashi: “Reinforcement Learning in Partially Observable Markov Decision Processes: A Stochastic Gradient Method”, Journal of the Japanese Society for Artificial Intelligence, Vol. 11, No. 5, pp. 761-768 (1996) (in Japanese)
木村元・山村雅幸・小林重信:「部分観測マルコフ決定過程下での強化学習:確率的傾斜法による接近」,人工知能学会誌,Vol. 11, No. 5, pp. 761-768 (1996)
(4) L. C. Baird and A. W. Moore: “Gradient Descent for General Reinforcement Learning”, Advances in Neural Information Processing Systems 11, MIT Press, pp. 968-974 (1999)

ｍore...

, , , , , , ,

Return to Previous Page