Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing

作者:

Highlights:

• As far as we know, Fast-DRD is the first to investigate Dec-POMDP for modeling the deadline-aware offloading problem. Fast-DRD drives a distributed offloading and decentralized learning for loosely coupled edge servers with lower synchronize requirement, especially in unknown data space or poor communication with the central cloud.

• Random exploration embodies non-iid data space and barriers to DRL efficiency in the edge. Cooperated with Dec-POMDP, we put forward the concept of trajectory observation history (TOH) as the basic distillation unit. TOH decomposes the optimization goal into ephemeral estimated rewards and accumulated real rewards for harvesting valuable knowledge and filtering out the noise in DRL.

• We conduct simulation experiments for multi-server edge computing offloading. The result shows that, compared with naive Policy Distillation, Fast-DRD’s two-stage distillation dramatically reduces the amount of exchanging data, and the learning time and data interaction cost decrease nearly 90%. In a complex environment of heterogeneous users with partial observation, offloading models learned by decentralized learning in Fast-DRD still maintain offloading efficiency.

摘要

•As far as we know, Fast-DRD is the first to investigate Dec-POMDP for modeling the deadline-aware offloading problem. Fast-DRD drives a distributed offloading and decentralized learning for loosely coupled edge servers with lower synchronize requirement, especially in unknown data space or poor communication with the central cloud.•Random exploration embodies non-iid data space and barriers to DRL efficiency in the edge. Cooperated with Dec-POMDP, we put forward the concept of trajectory observation history (TOH) as the basic distillation unit. TOH decomposes the optimization goal into ephemeral estimated rewards and accumulated real rewards for harvesting valuable knowledge and filtering out the noise in DRL.•We conduct simulation experiments for multi-server edge computing offloading. The result shows that, compared with naive Policy Distillation, Fast-DRD’s two-stage distillation dramatically reduces the amount of exchanging data, and the learning time and data interaction cost decrease nearly 90%. In a complex environment of heterogeneous users with partial observation, offloading models learned by decentralized learning in Fast-DRD still maintain offloading efficiency.

论文关键词:00-01,99-00,Data distillation,Deep reinforcement learning,Deadline-aware,Decentralized learning,Edge computing

论文评审过程:Received 16 August 2021, Revised 4 December 2021, Accepted 13 December 2021, Available online 15 January 2022, Version of Record 15 January 2022.

论文官网地址:https://doi.org/10.1016/j.ipm.2021.102850