Technical Update: Least-Squares Temporal Difference Learning

作者:Justin A. Boyan

摘要

TD.λ/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.λ/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.

论文关键词:reinforcement learning, temporal difference learning, value function approximation, linear least-squares methods

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1017936530646