Unified curiosity-Driven learning with smoothed intrinsic reward estimation

Highlights：

• We propose to improve the robustness of policy learning by smoothing the intrinsic reward with a batch of transitions close to the current transition; we propose to employ an attention module to extract task-relevant features for a more precise estimation of intrinsic reward;

• Extensive experiments on Atari games demonstrate the effectiveness of our approach.

摘要

•We propose a novel distribution-aware and policy-aware unified curiosity-driven learning framework to unify state novelty and state-action novelty. DAW enables the agent to explore states diversely, and PAWencourage the agent to explore the states that the policy is uncertain about which action to take. The proposed approach improves the exploration ability of RL with complete intrinsic reward;•We propose to improve the robustness of policy learning by smoothing the intrinsic reward with a batch of transitions close to the current transition; we propose to employ an attention module to extract task-relevant features for a more precise estimation of intrinsic reward;•Extensive experiments on Atari games demonstrate the effectiveness of our approach.

论文评审过程：Received 29 September 2020, Revised 18 September 2021, Accepted 25 September 2021, Available online 26 September 2021, Version of Record 17 October 2021.