Comprehensive comparison of online ADP algorithms for continuous-time optimal control

作者：Yuanheng Zhu, Dongbin Zhao

摘要

Online learning is an important property of adaptive dynamic programming (ADP). Online observations contain plentiful dynamics information, and ADP algorithms can utilize them to learn the optimal control policy. This paper reviews the research of online ADP algorithms for the optimal control of continuous-time systems. With the intensive study, ADP has been developed towards model free and data efficient. After separately introducing the algorithms, we compare their performance on the same problem. This paper is desired to provide a comprehensive understanding of continuous-time online ADP algorithms.

论文关键词：Adaptive dynamic programming, Policy iteration, Integral reinforcement learning, Experience replay, Off-policy

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10462-017-9548-4