SFGN: Representing the sequence with one super frame for video person re-identification

作者:

Highlights:

摘要

Video-based person re-identification (V-Re-ID) is more robust than image-based person re-identification (I-Re-ID) because of the additional temporal information. However, the high storage overhead of video sequences largely stems the applications of V-Re-ID. To reduce the storage overhead, we propose to represent each video sequence with only one frame. However, directly picking one frame from each sequence will reduce the performance dramatically. Thus, we propose a brand-new framework called super frame generation network (SFGN), which can encode the spatial–temporal information of a video sequence into a generated frame, which is called “super frame” to distinguish from the directly picked “key frame”. To achieve super frames of high visual quality and representation ability, we carefully design the specific-frame-feature fused skip-connection generator (SFSG). SFSG takes the role of a feature encoder and the co-trained image model can be seen as the corresponding feature decoder. To reduce the information loss in the encoding–decoding process, we further propose the feature recovery loss (FRL). To the best of our knowledge, we are the first to propose and relieve this issue. Extensive experiments on Mars, iLIDS-VID, and PRID2011 show that the proposed SFGN can generate super frames of high visual quality and representation ability. For the code, please visit the project website: .

论文关键词:Person re-identification,Video-based,Super frame,Deep learning

论文评审过程:Received 1 December 2021, Revised 19 April 2022, Accepted 20 April 2022, Available online 5 May 2022, Version of Record 14 May 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108884