Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning

作者：

Highlights：

• A deep image-to-video person re-identification pipeline with two modules is proposed to learn fine-grained and temporal invariant features.

• To address the appearance misalignment, a 3D-SAA module is designed to semantically align different human body parts in the 3D surface space.

• To address the modality misalignment, a CMIL module is developed to fuse two modalities with an interactive similarity comparison mechanism.

• A multi-branch aggregation network in 3D-SAA module is designed to weaken the influence of negligible body parts and backgrounds.

摘要

•A deep image-to-video person re-identification pipeline with two modules is proposed to learn fine-grained and temporal invariant features.•To address the appearance misalignment, a 3D-SAA module is designed to semantically align different human body parts in the 3D surface space.•To address the modality misalignment, a CMIL module is developed to fuse two modalities with an interactive similarity comparison mechanism.•A multi-branch aggregation network in 3D-SAA module is designed to weaken the influence of negligible body parts and backgrounds.

论文关键词：Person re-identification,Cross-modal learning,Appearance alignment

论文评审过程：Received 19 November 2020, Revised 20 June 2021, Accepted 9 September 2021, Available online 20 September 2021, Version of Record 24 September 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.108314