Inside Front Cover - Editorial Board Page/Cover image legend if applicable
Guest Editorial: Language in Vision
Visual question answering: Datasets, algorithms, and future challenges
Visual question answering: A survey of methods and datasets
Vision-language integration using constrained local semantic features
Recognizing semantic correlation in image-text weibo via feature space mapping
Simple to complex cross-modal learning to rank
Weakly supervised learning of actions from transcripts
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
Resolving vision and language ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes
Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language
Learning explicit video attributes from mid-level representation for video captioning