Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

作者:

Highlights:

• Learning from non-optimal demonstrations in presence of human evaluative feedbacks.

• Extending inverse reinforcement learning algorithm to incorporate human feedbacks.

• Our approach overcomes the challenge of non-optimality in demonstrations.

• Good performance of our approach in both simulated and real-world experiments.

摘要

•Learning from non-optimal demonstrations in presence of human evaluative feedbacks.•Extending inverse reinforcement learning algorithm to incorporate human feedbacks.•Our approach overcomes the challenge of non-optimality in demonstrations.•Good performance of our approach in both simulated and real-world experiments.

论文关键词:Teaching by demonstrations,Inverse reinforcement learning,Interactive learning,Human evaluative feedbacks

论文评审过程:Received 17 March 2018, Revised 24 May 2018, Accepted 14 June 2018, Available online 23 June 2018, Version of Record 2 July 2018.

论文官网地址:https://doi.org/10.1016/j.eswa.2018.06.035