Q-Managed: A new algorithm for a multiobjective reinforcement learning

作者:

Highlights:

• All Pareto front policies have been learned.

• Hypervolume metric used to validate policies as optimal.

• Shape of Pareto front did not influence the amount of policies found.

• Using knowledge gained in past iterations accelerated convergence to similar policies.

摘要

•All Pareto front policies have been learned.•Hypervolume metric used to validate policies as optimal.•Shape of Pareto front did not influence the amount of policies found.•Using knowledge gained in past iterations accelerated convergence to similar policies.

论文关键词:Multiobjective reinforcement learning,ε-constraint,Q-Learning,Pareto dominance,Single-policy approach,Hypervolume

论文评审过程:Received 30 August 2019, Revised 1 November 2020, Accepted 2 November 2020, Available online 6 November 2020, Version of Record 24 January 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.114228