Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions

作者：

摘要

Inverse reinforcement learning (IRL), analogously to RL, refers to both the problem and associated methods by which an agent passively observing another agent's actions over time, seeks to learn the latter's reward function. The learning agent is typically called the learner while the observed agent is often an expert in popular applications such as in learning from demonstrations. Some of the assumptions that underlie current IRL methods are impractical for many robotic applications. Specifically, they assume that the learner has full observability of the expert as it performs its task; that the learner has full knowledge of the expert's dynamics; and that there is always only one expert agent in the environment. For example, these assumptions are particularly restrictive in our application scenario where a subject robot is tasked with penetrating a perimeter patrol by two other robots after observing them from a vantage point. In our instance of this problem, the learner can observe at most 10% of the patrol.

论文关键词：Inverse reinforcement learning,Robotics,Machine learning,Maximum entropy

论文评审过程：Received 24 June 2017, Revised 27 June 2018, Accepted 3 July 2018, Available online 9 July 2018, Version of Record 25 July 2018.

论文官网地址：https://doi.org/10.1016/j.artint.2018.07.002