TVENet: Temporal variance embedding network for fine-grained action representation

作者:

Highlights:

摘要

With the breakthroughs in general action understanding, it has become an inevitable trend to analyze the actions in finer granularity. However, related researches have been largely hindered by the lack of fine-grained datasets and the difficulty of capturing subtle differences between fine-grained actions that are highly similar overall. In this paper, we address the above challenges by constructing a fine-grained action dataset, i.e., Figure Skating, which can be used for end-to-end network training and presenting a framework for the joint optimization of classification and similarity constraints. We propose to incorporate the triplet loss into the training of Convolutional Neural Network, which learns a mapping from fine-grained actions to a compact Euclidean space where distances directly correspond to a measure of action similarity. Triplet loss compels actions of distinct classes to have larger distances than actions of the same class. Besides, to boost the discrimination of the fine-grained actions, we further propose a temporal variance embedding network (TVENet) embedding temporal context variances into the feature embeddings during the joint network training. The experimental results on Figure Skating dataset, HMDB51 dataset as well as UCF101 dataset demonstrate the effectiveness of TVENet representation for fine-grained action search.

论文关键词:Fine-grained action representation,temporal variance embedding network (TVENet),joint optimization,temporal triplet loss,action search

论文评审过程:Received 28 May 2019, Revised 13 January 2020, Accepted 10 February 2020, Available online 21 February 2020, Version of Record 27 February 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107267