Art
J-GLOBAL ID:201602212752649639   Reference number:16A0268997

Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies

方策に関する知識を分離した方策こう配法 -環境ダイナミクスと行動価値による方策表現-
Author (2):
Material:
Volume: 136  Issue:Page: 282-289 (J-STAGE)  Publication year: 2016 
JST Material Number: S0810A  ISSN: 0385-4221  Document type: Article
Article type: 原著論文  Country of issue: Japan (JPN)  Language: JAPANESE (JA)
Thesaurus term:
Thesaurus term/Semi thesaurus term
Keywords indexed to the article.
All keywords is available on JDreamIII(charged).
On J-GLOBAL, this item will be available after more than half a year after the record posted. In addtion, medical articles require to login to MyJ-GLOBAL.

Semi thesaurus term:
Thesaurus term/Semi thesaurus term
Keywords indexed to the article.
All keywords is available on JDreamIII(charged).
On J-GLOBAL, this item will be available after more than half a year after the record posted. In addtion, medical articles require to login to MyJ-GLOBAL.

JST classification (1):
JST classification
Category name(code) classified by JST.
Artificial intelligence 
Reference (24):
  • (1) R. S. Sutton and A. G. Barto: Reinforcement Learning, MIT Press, Cambridge (1998)
  • (2) R. J. Williams: “Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning”, Machine Learning, Vol. 8, pp. 229-256 (1992)
  • (3) H. Kimura, M. Yamamura, and S. Kobayashi: “Reinforcement Learning in Partially Observable Markov Decision Processes: A Stochastic Gradient Method”, Journal of the Japanese Society for Artificial Intelligence, Vol. 11, No. 5, pp. 761-768 (1996) (in Japanese)
  • 木村 元・山村雅幸・小林重信:「部分観測マルコフ決定過程下での強化学習:確率的傾斜法による接近」,人工知能学会誌,Vol. 11, No. 5, pp. 761-768 (1996)
  • (4) L. C. Baird and A. W. Moore: “Gradient Descent for General Reinforcement Learning”, Advances in Neural Information Processing Systems 11, MIT Press, pp. 968-974 (1999)
more...

Return to Previous Page