An efficient data evacuation strategy using multi-objective reinforcement learning

摘要

When a disaster occurs, utilizing residual network resources as much as possible and allocating a reasonable bandwidth ratio for concurrent evacuation transfers are two key factors to improve the efficiency of data evacuation. However, because the evacuation activity is often implemented in a large-scale network of high complexity, these two factors are likely to be conflicting in some cases. Therefore, it is difficult to achieve or approximate the optimal evacuation solution only by single objective optimization. To achieve better utilization of network transmission capability in data evacuation, we leverage multi-objective reinforcement learning to simultaneously maximize the total evacuation flow and allocate proportional bandwidth to concurrent evacuation transfers. We design the vector of rewards for the two objectives and update Pareto approximate set by multiple state steps to approach the optimal solution. In every step, we leverage marked evacuation routing search based on heuristic information to construct action space. To improve the quality of candidate action, we search one available evacuation path for every evacuation transfer, and adjust bandwidth allocation to reduce the deviation between bandwidth ratio and data amount ratio. For action selection, we propose a roulette-based Chebyshev scalarization function to optimize the weight selection process for multi-objectives and enforce exploration to avoid falling into the local optimum. The simulation results demonstrates that our new strategy solves the weight selection problem successfully and achieves better performance with higher transmission efficiency.