Fast Online Q(λ)

作者：Marco Wiering, Jürgen Schmidhuber

摘要

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

论文关键词：reinforcement learning, Q-learning, TD(λ), online Q(λ), lazy learning

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1007562800292

原文链接
谷歌学术
必应学术
百度学术