View Transfer on Human Skeleton Pose: Automatically Disentangle the View-Variant and View-Invariant Information for Pose Representation Learning

作者:Qiang Nie, Yunhui Liu

摘要

Learning a good pose representation is significant for many applications, such as human pose estimation and action recognition. However, the representations learned by most approaches are not intrinsic and their transferability in different datasets and different tasks is limited. In this paper, we introduce a method to learn a versatile representation, which is capable of recovering unseen corrupted skeletons, being applied to the human action recognition, and transferring pose from one view to another view without knowing the relationships of cameras. To this end, a sequential bidirectional recursive network (SeBiReNet) is proposed for modeling kinematic dependency between skeleton joints. Utilizing the SeBiReNet as the core module, a denoising autoencoder is designed to learn intrinsic pose features through the task of recovering corrupted skeletons. Instead of only extracting the view-invariant feature as many other methods, we disentangle the view-invariant feature from the view-variant feature in the latent space and use them together as a representation of the human pose. For a better feature disentanglement, an adversarial augmentation strategy is proposed and applied to the denoising autoencoder. Disentanglement of view-variant and view-invariant features enables us to realize view transfer on 3D poses. Extensive experiments on different datasets and different tasks verify the effectiveness and versatility of the learned representation.

论文关键词:Representation learning, Human skeleton pose, View transfer, Unsupervised action recognition

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11263-020-01354-7