The Convergence of TD(λ) for General λ

作者：Peter Dayan

摘要

The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.

论文关键词：Reinforcement learning, temporal differences, asynchronous dynamic programming

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1022632907294

原文链接
谷歌学术
必应学术
百度学术