二人零和ゲームにおける突然変異付きレプリケータダイナミクスを用いた学習アルゴリズムに関する研究

坂本充生; 阿部拳之; 岩崎敦

文献

J-GLOBAL ID：202202273355469941 整理番号：22A0754470

二人零和ゲームにおける突然変異付きレプリケータダイナミクスを用いた学習アルゴリズムに関する研究

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=22A0754470&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=22A0754470&from=J-GLOBAL&jstjournalNo=S0731A") }}

著者 (3件)： , ,
資料名：
巻： 84th 号： 2 ページ： 2.47-2.48 発行年： 2022年02月17日
JST資料番号： S0731A 資料種別：会議録 (C)
記事区分：短報発行国：日本 (JPN) 言語：日本語 (JA)

・本研究では,二人零和ゲームにおける突然変異付きレプリケータダイナミクスを利用したオンライン学習アルゴリズムの帰結を吟味。
・突然変異に着想したアルゴリズムMFTRLを提案し,完全情報フィードバック設定と部分的フィードバック設定で3つの手法がどのような振る舞いを学習するかを吟味。
・3つの手法は,正規化リーダへの追従(FTRL),正規化リーダへの最適追従(OFRL),および正規化リーダへの突然変異追従(MFRL)。
・実験の結果,2つの設定でMFTRLのダイナミクスが時間平均を取らずに均衡に収束することを明示。

, , , , , , ,

人工知能

引用文献 (6件)：

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, pp. 6629-6640, 2017.
S. Ito. Parameter-free multi-armed bandit algorithms with hybrid data-dependent regret bounds. In Conference on Learning Theory, pp. 2552-2583. PMLR, 2021.
M. Johanson, K. Waugh, M. H. Bowling, and M. Zinkevich. Accelerating best response calculation in large extensive games. In IJCAI, pp. 258-265, 2011.
C. Lee, H. Luo, C. Wei, and M. Zhang. Linear last-iterate convergence for matrix games and stochastic games. CoRR, abs/2006.09517, 2020.
P. Mertikopoulos, C. Papadimitriou, and G. Piliouras. Cycles in adversarial regularized learning. In ACM-SIAM Symposium on Discrete Algorithms, pp. 2703-2717, 2018.

, , ,

前のページに戻る