Fast Online Q(λ)

作者:Marco Wiering, Jürgen Schmidhuber

摘要

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

论文关键词:reinforcement learning, Q-learning, TD(λ), online Q(λ), lazy learning

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007562800292