Algorithms for computing strategies in two-player simultaneous move games

作者:

Highlights:

• We present algorithms for computing strategies in zero-sum simultaneous move games.

• The algorithms include exact algorithms and Monte Carlo sampling algorithms.

• We compare the algorithms in the offline computation and the online game-playing.

• Novel exact algorithm dominates in the offline equilibrium strategy computation.

• Novel sampling algorithms can guarantee convergence to optimal strategies.

摘要

Simultaneous move games model discrete, multistage interactions where at each stage players simultaneously choose their actions. At each stage, a player does not know what action the other player will take, but otherwise knows the full state of the game. This formalism has been used to express games in general game playing and can also model many discrete approximations of real-world scenarios. In this paper, we describe both novel and existing algorithms that compute strategies for the class of two-player zero-sum simultaneous move games. The algorithms include exact backward induction methods with efficient pruning, as well as Monte Carlo sampling algorithms. We evaluate the algorithms in two different settings: the offline case, where computational resources are abundant and closely approximating the optimal strategy is a priority, and the online search case, where computational resources are limited and acting quickly is necessary. We perform a thorough experimental evaluation on six substantially different games for both settings. For the exact algorithms, the results show that our pruning techniques for backward induction dramatically improve the computation time required by the previous exact algorithms. For the sampling algorithms, the results provide unique insights into their performance and identify favorable settings and domains for different sampling algorithms.

论文关键词:Simultaneous move games,Markov games,Backward induction,Monte Carlo Tree Search,Alpha-beta pruning,Double-oracle algorithm,Regret matching,Counterfactual regret minimization,Game playing,Nash equilibrium

论文评审过程:Received 14 July 2014, Revised 9 January 2016, Accepted 22 March 2016, Available online 1 April 2016, Version of Record 11 April 2016.

论文官网地址:https://doi.org/10.1016/j.artint.2016.03.005