Within the scope of prediction: Shaping intrinsic rewards via evaluating uncertainty
作者:
Highlights:
• Predict subsequent observations through predictive networks to guide the policy.
• Construct uncertainty via real and predictive observations to improve exploration.
• Propose a reward-penalty model via environment information to reduce interference.
• Calculate intrinsic rewards by using uncertainty and reward-penalty models.
• Demonstrate good scalability in different environments and methods.
摘要
•Predict subsequent observations through predictive networks to guide the policy.•Construct uncertainty via real and predictive observations to improve exploration.•Propose a reward-penalty model via environment information to reduce interference.•Calculate intrinsic rewards by using uncertainty and reward-penalty models.•Demonstrate good scalability in different environments and methods.
论文关键词:Reinforcement learning,Policy optimization,Exploration,Prediction,Reward shaping
论文评审过程:Received 2 September 2021, Revised 4 June 2022, Accepted 4 June 2022, Available online 13 June 2022, Version of Record 18 June 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.117775