SAPS: Self-Attentive Pathway Search for weakly-supervised action localization with background-action augmentation

作者:

Highlights:

摘要

Weakly supervised temporal action localization is a challenging computer vision task, which aims to derive frame-level action identifier based on video-level supervision. Attention mechanism is a widely used paradigm for action recognition and localization in most recent methods. However, existing attention-based methods mostly focus on capturing the global dependency of the frame sequence regardless of the local inter-frame distances. Moreover, during background modeling, different background contents are typically classified into one category, which inevitably jeopardizes the discriminative ability of classifiers and brings about irrelevant noise. In this paper, we present a novel self-attentive pathway search framework, namely SAPS, to address the above challenges. To achieve comprehensive representation with discriminative attention weights, we design a NAS-based attentive module with a path-level searching process, and construct a competitive attention structure revealing both local and global dependency. Furthermore, we propose the action-related background modeling for robust background-action augmentation, where knowledge derived from background can provide informative clues for action recognition. An ensemble T-CAM operation is subsequently designed to incorporate background information to further refine the temporal action localization results. Extensive experiments on two benchmark datasets (i.e., THUMOS14 and ActivityNet1.2) have clearly corroborated the efficacy of our method.

论文关键词:

论文评审过程:Received 15 September 2020, Revised 10 May 2021, Accepted 24 July 2021, Available online 3 August 2021, Version of Record 9 August 2021.

论文官网地址:https://doi.org/10.1016/j.cviu.2021.103256