Recognizing actions in images by fusing multiple body structure cues

作者:

Highlights:

• We propose a unified model for recognizing human actions in static images. It explicitly investigates the body structure information as well as integrates the body structure exploration and action classification tasks into a unified model. Moreover, we design a twostep learning technique, where keypoint estimation provides intermediate supervision for learning human action representations.

• We design two body structure cues, SBPs and LAD, to fully explore the structure information of human bodies from the local and global perspectives.

• In order to construct body parts with different scales in unconstrained images, we propose a technique to use human keypoint heatmaps to generate scale adaptive SBPs, which extract fine-grained local human features. Moreover, we propose a technique to automatically determine the most discriminative body part of each action category for identifying the ongoing action. In order to extract global hightlevel body structure features, we propose the LAD to model the spatial angle relationship of pairs of human limbs. The LAD is more robust and achieves better performance compared with the distance based skeleton descriptor.

• We evaluate our model on two challenging image-based action datasets, and the results show that our method achieves the state-of-the-art performance.

摘要

•We propose a unified model for recognizing human actions in static images. It explicitly investigates the body structure information as well as integrates the body structure exploration and action classification tasks into a unified model. Moreover, we design a twostep learning technique, where keypoint estimation provides intermediate supervision for learning human action representations.•We design two body structure cues, SBPs and LAD, to fully explore the structure information of human bodies from the local and global perspectives.•In order to construct body parts with different scales in unconstrained images, we propose a technique to use human keypoint heatmaps to generate scale adaptive SBPs, which extract fine-grained local human features. Moreover, we propose a technique to automatically determine the most discriminative body part of each action category for identifying the ongoing action. In order to extract global hightlevel body structure features, we propose the LAD to model the spatial angle relationship of pairs of human limbs. The LAD is more robust and achieves better performance compared with the distance based skeleton descriptor.•We evaluate our model on two challenging image-based action datasets, and the results show that our method achieves the state-of-the-art performance.

论文关键词:Image-based action recognition,Convolutional neural network,Body structure cues

论文评审过程:Received 14 March 2019, Revised 1 March 2020, Accepted 17 March 2020, Available online 31 March 2020, Version of Record 11 May 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107341