Automatic programming of behavior-based robots using reinforcement learning

作者：

摘要

This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. 1.(1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program.2.(2) Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks.

论文关键词：

论文评审过程：Available online 19 February 2003.

论文官网地址：https://doi.org/10.1016/0004-3702(92)90058-6