An examination of evolved behavior in two reinforcement learning systems

摘要

Using agent-based simulation experiments, we assess the relative performance of two Reinforcement Learning System (RLS) paradigms – the classical Learning Classifier System (LCS) and an enhancement, the Extended Classifier System (XCS) – in the context of playing the Iterated Prisoner's Dilemma (IPD) game. In prior research, the XCS outperforms the LCS in solving the Animats-and-Maze and Boolean Multiplexer test problems. Our work has overlaps with and is an extension of such efforts in that it allows assessment of each system's ability to (a) cope with delayed environmental feedback, (b) evolve irrational choice as the optimal behavior, and (c) cope with unpredictable input from the environment. We find that while the XCS is considerably superior to the LCS, in terms of four key performance metrics, in playing IPD games against a deterministic, reactive game-playing agent (Tit-for-Tat), the LCS does better against an unpredictable opponent (Rand) albeit with significant evolutionary effort.Further, upon examining each XCS enhancement in isolation, we see that specific LCS variants equipped with a single XCS feature, do better than the traditional LCS model and/or the XCS model in terms of particular metrics against both types of opponents but, again, usually with greater evolutionary effort. This suggests that if offline, rather than online, performance and specific performance goals are the focus, then one may construct relatively-simpler LCS variants rather than full-fledged XCS systems. Further assessments using LCS variants equipped with combinations of XCS features should help better comprehend the synergistic impacts of these features on performance in the IPD.