Within the scope of prediction: Shaping intrinsic rewards via evaluating uncertainty

作者:

Highlights:

• Predict subsequent observations through predictive networks to guide the policy.

• Construct uncertainty via real and predictive observations to improve exploration.

• Propose a reward-penalty model via environment information to reduce interference.

• Calculate intrinsic rewards by using uncertainty and reward-penalty models.

• Demonstrate good scalability in different environments and methods.

摘要

•Predict subsequent observations through predictive networks to guide the policy.•Construct uncertainty via real and predictive observations to improve exploration.•Propose a reward-penalty model via environment information to reduce interference.•Calculate intrinsic rewards by using uncertainty and reward-penalty models.•Demonstrate good scalability in different environments and methods.

论文关键词:Reinforcement learning,Policy optimization,Exploration,Prediction,Reward shaping

论文评审过程:Received 2 September 2021, Revised 4 June 2022, Accepted 4 June 2022, Available online 13 June 2022, Version of Record 18 June 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117775