Learning heuristic functions for large state spaces

摘要

We investigate the use of machine learning to create effective heuristics for search algorithms such as ⁎IDA⁎ or heuristic-search planners such as FF. Our method aims to generate a sequence of heuristics from a given weak heuristic h0 and a set of unsolved training instances using a bootstrapping procedure. The training instances that can be solved using h0 provide training examples for a learning algorithm that produces a heuristic h1 that is expected to be stronger than h0. If h0 is so weak that it cannot solve any of the given instances we use random walks backward from the goal state to create a sequence of successively more difficult training instances starting with ones that are guaranteed to be solvable by h0. The bootstrap process is then repeated using hi in lieu of hi−1 until a sufficiently strong heuristic is produced. We test this method on the 24-sliding-tile puzzle, the 35-pancake puzzle, Rubikʼs Cube, and the 20-blocks world. In every case our method produces a heuristic that allows ⁎IDA⁎ to solve randomly generated problem instances quickly with solutions close to optimal.The total time for the bootstrap process to create strong heuristics for these large state spaces is on the order of days. To make the process effective when only a single problem instance needs to be solved, we present a variation in which the bootstrap learning of new heuristics is interleaved with problem-solving using the initial heuristic and whatever heuristics have been learned so far. This substantially reduces the total time needed to solve a single instance, while the solutions obtained are still close to optimal.