A multiagent player system composed by expert agents in specific game stages operating in high performance environment

摘要

This paper proposes a new approach for the non-supervised learning process of multiagent player systems operating in a high performance environment, being that the cooperative agents are trained so as to be expert in specific stages of a game. This proposal is implemented by means of the Checkers automatic player denominated D-MA-Draughts, which is composed of 26 agents. The first is specialized in initial and intermediary game stages, whereas the remaining are specialists in endgame stages (defined by board-games containing, at most, 12 pieces). Each of these agents consists of a Multilayer Neural Network, trained without human supervision through Temporal Difference Methods. The best move is determined by the distributed search algorithm known as Young Brothers Wait Concept. Each endgame agent is able to choose a move from a determined profile of endgame board. These profiles are defined by a clustering process performed by a Kohonen-SOM network from a database containing endgame boards retrieved from real matches. Once trained, the D-MA-Draughts agents can actuate in a match according to two distinct game dynamics. In fact, the D-MA-Draughts architecture corresponds to an extension of two preliminary versions: MP-Draughts, which is a multiagent system with a serial search algorithm, and D-VisionDraughts, which is a single agent with a distributed search algorithm. The D-MA-Draughts gains are estimated through several tournaments against these preliminary versions. The results show that D-MA-Draughts improves upon its predecessors by significantly reducing training time and the endgame loops, thus beating them in several tournaments.