One model packs thousands of items with Recurrent Conditional Query Learning

作者:

Highlights:

摘要

Recent studies have revealed that neural combinatorial optimization (NCO) has advantages over conventional algorithms in many combinatorial optimization problems such as routing, but it is less efficient for more complicated optimization tasks such as packing which involves mutually conditioned action spaces. In this paper, we propose a Recurrent Conditional Query Learning (RCQL) method to solve both 2D and 3D packing problems. We first embed states by a recurrent encoder, and then adopt attention with conditional queries from previous actions. The conditional query mechanism fills the information gap between learning steps, which shapes the problem as a Markov decision process. Benefiting from the recurrence, a single RCQL model is capable of handling different sizes of packing problems. Experiment results show that RCQL can effectively learn strong heuristics for offline and online strip packing problems (SPPs), outperforming a wide range of baselines in space utilization ratio. RCQL reduces the average bin gap ratio by 1.83% in offline 2D 40-box cases and 7.84% in 3D cases compared with state-of-the-art methods. Meanwhile, our method also achieves 5.64% higher space utilization ratio for SPPs with 1000 items than the state of the art.

论文关键词:Deep reinforcement learning,Neural combinatorial optimization,Markov decision process,Packing problem

论文评审过程:Received 21 May 2021, Revised 28 October 2021, Accepted 31 October 2021, Available online 2 November 2021, Version of Record 11 November 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107683