ロックアップ期間による制約を考慮した確率的バンディット問題

小宮山純平; 佐藤一誠; 中川裕志

文献

J-GLOBAL ID：201502258325092722 整理番号：15A0447067

ロックアップ期間による制約を考慮した確率的バンディット問題

Multi-armed Bandit Problem with Lock-up Periods

出版者サイト {{ this.onShowPLink() }} 複写サービスで全文入手 {{ this.onShowCLink("http://jdream3.com/copy/?sid=JGLOBAL&noSystem=1&documentNoArray=15A0447067&COPY=1") }}
高度な検索・分析はJDreamⅢで {{ this.onShowJLink("http://jdream3.com/lp/jglobal/index.html?docNo=15A0447067&from=J-GLOBAL&jstjournalNo=U0475A") }}

著者 (3件)： , ,
資料名：
巻： 6 号： 3 ページ： WEB ONLY 11-22 発行年： 2013年12月27日
JST資料番号： U0475A ISSN： 1882-7780 資料種別：逐次刊行物 (A)
記事区分：原著論文発行国：日本 (JPN) 言語：日本語 (JA)

バンディット問題は,複数のアーム(選択肢)から最も報酬の高いものを探す問題であり,探索と活用のトレードオフの代表的なモデルの1つである。近年において,情報推薦,最適経路探索,最適化,モデル選択などの分野への応用を動機として,バンディット問題は機械学習やオペレーション・リサーチの分野において注目を浴びている。本研究はロックアップ期間(選択するアームを変更できない期間)の制約を考慮したバンディット問題を提案し,どのような方策をとればよいかを調べる。既存の多くの有益なアルゴリズムがロックアップ期間を含めた場合に自然に拡張可能であることを示し,そのregret(性能)を評価する。このregretがロックアップ期間の最大の大きさに依存することを示す。さらに,ロックアップ期間が大きい場合にregretを減らすことができるBalancing and Recommendation(BaR)メタアルゴリズムを提案する。また,計算機実験の結果を示し,理論的な結果と比較し考察する。(著者抄録)

, , , , , , , , ,
, , , ,

人工知能 , その他のオペレーションズリサーチの手法

引用文献 (25件)：

Robbins, H.: Some aspects of the sequential design of experiments, Bulletin of the AMS, Vol.58, pp.527-535 (1952).
Zhao, Q., Tong, L., Swami, A. and Chen, Y.: Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework, IEEE Journal on Selected Areas in Communications, Vol.25, pp.589-600 (online), DOI: 10.1109/JSAC.2007.070409 (2007).
Maron, O. and Moore, A.W.: Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation, NIPS, pp.59-66 (1993).
Mnih, V., Szepesvári, C. and yves Audibert, J.: Empirical Bernstein stopping, International Conference on Machine Learning, pp.672-679 (online), DOI: 10.1145/1390156.1390241 (2008).
Agarwal, A., Dekel, O. and Xiao, L.: Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback, Computational Learning Theory, pp.28-40 (2010).

前のページに戻る