A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions

作者:Rémi Munos

摘要

This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.

论文关键词:reinforcement learning, dynamic programming, optimal control, viscosity solutions, finite difference and finite element methods, Hamilton-Jacobi-Bellman equation

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007686309208