Graph transformer network with temporal kernel attention for skeleton-based action recognition

作者:

Highlights:

摘要

Skeleton-based human action recognition has caused wide concern, as skeleton data can robustly adapt to dynamic circumstances such as camera view changes and background interference thus allowing recognition methods to focus on robust features. In recent studies, the human body is modeled as a topological graph, and the graph convolution network (GCN) is used to extract features of actions. Although GCN has a strong ability to learn spatial modes, it ignores the varying degrees of higher-order dependencies that are captured by message passing. Moreover, the joints represented by vertices are interdependent, and hence incorporating an attention mechanism to weigh dependencies is beneficial. In this work, we propose a kernel attention adaptive graph transformer network (KA-AGTN), which models the higher-order spatial dependencies between joints by the graph transformer operator based on multihead self-attention. In addition, the Temporal Kernel Attention (TKA) block in KA-AGTN generates a channel-level attention score using temporal features, which can enhance temporal motion correlation. After combining the two-stream framework and adaptive graph strategy, KA-AGTN outperforms the baseline 2s-AGCN by 1.9% and by 1% under X-Sub and X-View on the NTU-RGBD 60 dataset, by 3.2% and 3.1% under X-Sub and X-Set on the NTU-RGBD 120 dataset, and by 2% and 2.3% under Top-1 and Top-5 and achieves the state-of-the-art performance on the Kinetics-Skeleton 400 dataset.

论文关键词:Skeleton-based data,Action recognition,Graph transformer,Kernel attention,Spatio-temporal graph

论文评审过程:Received 1 September 2021, Revised 10 December 2021, Accepted 3 January 2022, Available online 10 January 2022, Version of Record 25 January 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108146