Risk-Sensitive Reinforcement Learning

作者：Oliver Mihatsch, Ralph Neuneier

摘要

Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.

论文关键词：reinforcement learning, risk-sensitive control, temporal differences, dynamic programming, Bellman's equation

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1017940631555