Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing

Highlights：

• Random exploration embodies non-iid data space and barriers to DRL efficiency in the edge. Cooperated with Dec-POMDP, we put forward the concept of trajectory observation history (TOH) as the basic distillation unit. TOH decomposes the optimization goal into ephemeral estimated rewards and accumulated real rewards for harvesting valuable knowledge and filtering out the noise in DRL.

• We conduct simulation experiments for multi-server edge computing offloading. The result shows that, compared with naive Policy Distillation, Fast-DRD’s two-stage distillation dramatically reduces the amount of exchanging data, and the learning time and data interaction cost decrease nearly 90%. In a complex environment of heterogeneous users with partial observation, offloading models learned by decentralized learning in Fast-DRD still maintain offloading efficiency.

摘要

•As far as we know, Fast-DRD is the first to investigate Dec-POMDP for modeling the deadline-aware offloading problem. Fast-DRD drives a distributed offloading and decentralized learning for loosely coupled edge servers with lower synchronize requirement, especially in unknown data space or poor communication with the central cloud.•Random exploration embodies non-iid data space and barriers to DRL efficiency in the edge. Cooperated with Dec-POMDP, we put forward the concept of trajectory observation history (TOH) as the basic distillation unit. TOH decomposes the optimization goal into ephemeral estimated rewards and accumulated real rewards for harvesting valuable knowledge and filtering out the noise in DRL.•We conduct simulation experiments for multi-server edge computing offloading. The result shows that, compared with naive Policy Distillation, Fast-DRD’s two-stage distillation dramatically reduces the amount of exchanging data, and the learning time and data interaction cost decrease nearly 90%. In a complex environment of heterogeneous users with partial observation, offloading models learned by decentralized learning in Fast-DRD still maintain offloading efficiency.