Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions

作者：Atsuhiro Kojima, Takeshi Tamura, Kunio Fukunaga

摘要

We propose a method for describing human activities from video images based on concept hierarchies of actions. Major difficulty in transforming video images into textual descriptions is how to bridge a semantic gap between them, which is also known as inverse Hollywood problem. In general, the concepts of events or actions of human can be classified by semantic primitives. By associating these concepts with the semantic features extracted from video images, appropriate syntactic components such as verbs, objects, etc. are determined and then translated into natural language sentences. We also demonstrate the performance of the proposed method by several experiments.

论文关键词：natural language generation, concept hierarchy, semantic primitive, position/posture estimation of human, case frame

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1020346032608