An exploratory rollout policy for imagination-augmented agents

作者:Peng Liu, Yingnan Zhao, Wei Zhao, Xianglong Tang, Zichan Yang

摘要

Typical reinforcement learning methods usually lack planning and thus require large amounts of training data to achieve the expected performance. Imagination-Augmented Agents(I2A) based on a model-based method learns to extract information from the imagined trajectories to construct implicit plans and show improved data efficiency and performance. However, in I2A, these imagined trajectories are generated by a shared rollout policy, which makes these trajectories look similar and contain little information. We propose an exploratory rollout policy named E-I2A. When the agent’s performance is poor, E-I2A produces diversity in the imagined trajectories that are more informative. When the agent’s performance is improved with training, the trajectories generated by E-I2A are consistent with agent trajectories in the real environment and produce high rewards. To achieve this, first we formulate the novelty of one state through training an inverse dynamic model and then the agent picks the states with the highest novelty to generate diverse trajectories. Simultaneously, we train a distilled value function model to estimate the expected return of one state. By doing this, we can imagine the state with the highest return that makes the imagined trajectories consistent with the real trajectories. Finally, we propose an adaptive method to improve the agent’s performance that produces consistent imagined trajectories that were originally very diverse. Our method demonstrates improved performance and data efficiency through offering more information when making decisions. We evaluated E-I2A on several challenging domains including Minipacman and Sokoban; E-I2A can outperform several baselines.

论文关键词:Model-based reinforcement learning, Implicit plan, Imagination-Augmented Agents, Exploratory rollout policy

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-019-01484-7