Multi-modal generative adversarial network for zero-shot learning

作者:

Highlights:

摘要

In this paper, we propose a novel approach for Zero-Shot Learning (ZSL), where the test instances are from the novel categories that no visual data are available during training. The existing approaches typically address ZSL by embedding the visual features into a category-shared semantic space. However, these embedding-based approaches easily suffer from the “heterogeneity gap” issue since a single type of class semantic prototype cannot characterize the categories well. To alleviate this issue, we assume that different class semantics reflect different views of the corresponding class, and thus fuse various types of class semantic prototypes resided in different semantic spaces with a feature fusion network to generate pseudo visual features. Through the adversarial mechanism of the real visual features and the fused pseudo visual features, the complementary semantics in various spaces are effectively captured. Experimental results on three benchmark datasets demonstrate that the proposed approach achieves impressive performances on both traditional ZSL and generalized ZSL tasks.

论文关键词:Zero-shot learning,Multi-modal generative adversarial network,Feature fusion

论文评审过程:Received 16 January 2020, Revised 18 March 2020, Accepted 29 March 2020, Available online 7 April 2020, Version of Record 24 April 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.105847