Adaptive early classification of temporal sequences using deep reinforcement learning

作者：

Highlights：

•

摘要

In this article, we address the problem of early classification (EC) of temporal sequences with adaptive prediction times. We frame EC as a sequential decision making problem and we define a partially observable Markov decision process (POMDP) fitting the competitive objectives of classification earliness and accuracy. We solve the POMDP by training an agent for EC with deep reinforcement learning (DRL). The agent learns to make adaptive decisions between classifying incomplete sequences now or delaying its prediction to gather more measurements. We adapt an existing DRL algorithm for batch and online learning of the agent’s action value function with a deep neural network. We propose strategies of prioritized sampling, prioritized storing and random episode initialization to address the fact that the agent’s memory is unbalanced due to (1): all but one of its actions terminate the process and thus (2): actions of classification are less frequent than the action of delay. In experiments, we show improvements in accuracy induced by our specific adaptation of the algorithm used for online learning of the agent’s action value function. Moreover, we compare two definitions of the POMDP based on delay reward shaping against reward discounting. Finally, we demonstrate that a static naive deep neural network, i.e. trained to classify at static times, is less efficient in terms of accuracy against speed than the equivalent network trained with adaptive decision making capabilities.

论文关键词：Early classification,Adaptive prediction time,Deep reinforcement learning,Temporal sequences,Double DQN,Trade-off between accuracy vs. speed

论文评审过程：Received 1 March 2019, Revised 19 September 2019, Accepted 27 November 2019, Available online 29 November 2019, Version of Record 7 February 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2019.105290