Discriminative key-component models for interaction detection and recognition

作者:

Highlights:

摘要

Not all frames are equal – selecting a subset of discriminative frames from a video can improve performance at detecting and recognizing human interactions. In this paper we present models for categorizing a video into one of a number of predefined interactions or for detecting these interactions in a long video sequence. The models represent the interaction by a set of key temporal moments and the spatial structures they entail. For instance: two people approaching each other, then extending their hands before engaging in a “handshaking” interaction. Learning the model parameters requires only weak supervision in the form of an overall label for the interaction. Experimental results on the UT-Interaction and VIRAT datasets verify the efficacy of these structured models for human interactions.

论文关键词:

论文评审过程:Received 7 September 2014, Accepted 24 February 2015, Available online 4 March 2015.

论文官网地址:https://doi.org/10.1016/j.cviu.2015.02.012