Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering.评价结果

评估详情

9