Policy gradient in Lipschitz Markov Decision Processes

作者：Matteo Pirotta, Marcello Restelli, Luca Bascetta

摘要

This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learning process, we show that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters. By leveraging such properties, we define policy-parameter updates that guarantee a performance improvement at each iteration. The proposed methods are empirically evaluated and compared to other related approaches using different configurations of three popular control scenarios: the linear quadratic regulator, the mass-spring-damper system and the ship-steering control.

论文关键词：Reinforcement learning, Markov Decision Process, Lipschitz continuity, Policy gradient algorithm

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10994-015-5484-1