The Convergence of TD(λ) for General λ

作者:Peter Dayan

摘要

The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.

论文关键词:Reinforcement learning, temporal differences, asynchronous dynamic programming

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1022632907294