Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

作者:Ole-Christoffer Granmo, Sondre Glimsdal

摘要

The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines.

论文关键词:Bandit problems, Goore Game, Bayesian learning, Decentralized decision making, Quality of service control, Wireless sensor networks

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-012-0346-z