Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

作者：

Highlights：

• An intra-episode belief continuously guides policy selection.

• Episodic rewards and opponent models are used to infer the opponent policy.

• Our approach can track the opponent who switches its policy within an episode.

• Opponent policy switch frequencies do not degrade the agent’s performance.

• Previously learned knowledge is used against an unknown opponent type.

摘要

•An intra-episode belief continuously guides policy selection.•Episodic rewards and opponent models are used to infer the opponent policy.•Our approach can track the opponent who switches its policy within an episode.•Opponent policy switch frequencies do not degrade the agent’s performance.•Previously learned knowledge is used against an unknown opponent type.

论文关键词：Policy reuse,Opponent modeling,Reinforcement learning,Markov games,Option

论文评审过程：Received 6 June 2021, Revised 29 December 2021, Accepted 8 February 2022, Available online 16 February 2022, Version of Record 25 February 2022.

论文官网地址：https://doi.org/10.1016/j.knosys.2022.108404