Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

作者：Dang Quang Nguyen, Ngo Anh Vien, Viet-Hung Dang, TaeChoong Chung

摘要

Meta-learning has recently received much attention in a wide variety of deep reinforcement learning (DRL). In non-meta-learning, we have to train a deep neural network as a controller to learn a specific control task from scratch using a large amount of data. This way of training has shown many limitations in handling different related tasks. Therefore, meta-learning on control domains becomes a powerful tool for transfer learning on related tasks. However, it is widely known that meta-learning requires massive computation and training time. This paper will propose a novel DRL framework, which is called HCGF-R2-DDPG (Hybrid CPU/GPU Framework for Reptile+ and Recurrent Deep Deterministic Policy Gradient). HCGF-R2-DDPG will integrate meta-learning into a general asynchronous training architecture. The proposed framework will allow utilising both CPU and GPU to boost the training speed for the meta network initialisation. We will evaluate HCGF-R2-DDPG on various Partially Observable Markov Decision Process (POMDP) domains.

论文关键词：Meta learning, Deep reinforcement learning, Partial observable Markov decision process, Asynchronous framework, Recurrent deep deterministic policy gradient

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-020-01748-7