ACE-RL-Checkers: decision-making adaptability through integration of automatic case elicitation, reinforcement learning, and sequential pattern mining

摘要

In agents that operate in environments where decision-making needs to take into account, not only the environment, but also the minimizing actions of an opponent (as in games), it is fundamental that the agent is endowed with the ability of progressively tracing the profile of its adversaries, in such a manner that this profile aids in the process of selecting appropriate actions. However, it would be unsuitable to construct an agent with a decision-making system based only on the elaboration of such a profile, as this would prevent the agent from having its “own identity,” which would leave the agent at the mercy of its opponent. Following this direction, this study proposes an automatic Checkers player, called ACE-RL-Checkers, equipped with a dynamic decision-making module, which adapts to the profile of the opponent over the course of the game. In such a system, the action selection process is conducted through a composition of multilayer perceptron neural network and case library. In this case, the neural network represents the “identity” of the agent, i.e., it is an already trained static decision-making module. On the other hand, the case library represents the dynamic decision-making module of the agent, which is generated by the Automatic Case Elicitation technique. This technique has a pseudo-random exploratory behavior, which allows the dynamic decision-making of the agent to be directed either by the opponent’s game profile or randomly. In order to avoid a high occurrence of pseudo-random decision-making in the game initial phases—in which the agent counts on very little information about its opponent—this work proposes a new module based on sequential pattern mining for generating a base of experience rules extracted from human expert’s game records. This module will improve the agent’s move selection in the game initial phases. Experiments carried out in tournaments involving ACE-RL-Checkers and other agents correlated to this work, confirm the superiority of the dynamic architecture proposed herein.