Average reward adjusted deep reinforcement learning for order release planning in manufacturing

作者:

Highlights:

摘要

One of the key challenges in production planning, especially in discrete manufacturing, is to determine when to release which orders to the shop floor. The major aim of this planning task is to balance Work-In-Process (WIP) and utilisation levels together with timely completion of orders. The two most crucial attributes of production planning are (i) the highly nonlinear relationship between WIP, flow times and output, and (ii) the dynamically changing environment. Nonetheless, most state-of-the-art models use static lead times to address this problem. Only recently, some papers set lead times dynamically based on the flow time forecasts to react to the dynamic operational characteristics reporting promising results. This paper contributes to this line of research by presenting an order release model that uses reinforcement learning (RL) to set lead times dynamically over time. The applied RL agent is especially designed for processes with periodic feedback and highly variable context. We compare the performance of our new RL algorithm to static order release models and state-of-the-art deep Q-Learning agents by using a multi-stage, multi-product flow-shop simulation model. The results show that, especially for scenarios with high utilisation, our proposed method outperforms the other approaches.

论文关键词:Operations research,Production planning,Order release,Machine learning,Reinforcement learning

论文评审过程:Received 9 November 2021, Revised 5 April 2022, Accepted 5 April 2022, Available online 12 April 2022, Version of Record 23 April 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108765