Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization

摘要

The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually problem-tailored, which lack generality and scalability. To address these issues, a heuristic algorithm based on deep reinforcement learning (DRLH) is proposed in this paper. It features in inputting specific features and using a neural network model called NN to guild the selection of variable at each solution construction step. Also, to improve the algorithm speed and efficiency, two algorithm variants named simplified DRLH (DRLS) and DRLS with hill climbing (DRLS-HC) are developed as well. These three algorithms are examined through extensive experiments in comparison with famous heuristic algorithms from the literature. Experimental results show that the DRLH, DRLS, and DRLS-HC outperform their competitors in terms of both solution quality and computational efficiency. Precisely, the DRLH achieves the best-quality results, while DRLS offers a high-quality solution in a very short time. By adding a hill-climbing procedure to DRLS, the resulting DRLS-HC algorithm is able to obtain almost the same quality result as DRLH with however 5 times less computing time on average. We conducted additional experiments on large-scale instances and various data distributions to verify the generality and scalability of the proposed algorithms, and the results on benchmark instances indicate the ability of the algorithms to be applied to practical problems.