TCLR: Temporal contrastive learning for video representation

作者：

Highlights：

• TCLR is a contrastive learning framework for video understanding tasks.

• Explicitly enforces within instance temporal feature variation without pretext tasks.

• Proposes novel local–local and global–local temporal contrastive losses.

• Significantly outperforms state-of-art pre-training on video understanding tasks.

• Uses fine-grained action classification task for evaluating learned representations.

摘要

•TCLR is a contrastive learning framework for video understanding tasks.•Explicitly enforces within instance temporal feature variation without pretext tasks.•Proposes novel local–local and global–local temporal contrastive losses.•Significantly outperforms state-of-art pre-training on video understanding tasks.•Uses fine-grained action classification task for evaluating learned representations.

论文关键词：Self-Supervised Learning,Action Recognition,Video Representation

论文评审过程：Received 10 August 2021, Revised 7 January 2022, Accepted 5 March 2022, Available online 16 March 2022, Version of Record 5 April 2022.

论文官网地址：https://doi.org/10.1016/j.cviu.2022.103406