Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

作者：

Highlights：

• Learning from non-optimal demonstrations in presence of human evaluative feedbacks.

• Extending inverse reinforcement learning algorithm to incorporate human feedbacks.

• Our approach overcomes the challenge of non-optimality in demonstrations.

• Good performance of our approach in both simulated and real-world experiments.

摘要

•Learning from non-optimal demonstrations in presence of human evaluative feedbacks.•Extending inverse reinforcement learning algorithm to incorporate human feedbacks.•Our approach overcomes the challenge of non-optimality in demonstrations.•Good performance of our approach in both simulated and real-world experiments.

论文关键词：Teaching by demonstrations,Inverse reinforcement learning,Interactive learning,Human evaluative feedbacks

论文评审过程：Received 17 March 2018, Revised 24 May 2018, Accepted 14 June 2018, Available online 23 June 2018, Version of Record 2 July 2018.

论文官网地址：https://doi.org/10.1016/j.eswa.2018.06.035