Video prediction: a step-by-step improvement of a video synthesis network

作者：Beibei Jing, Hongwei Ding, Zhijun Yang, Bo Li, Liyong Bao

摘要

Although focusing on the field of video generation has made some progress in network performance and computational efficiency, there is still much room for improvement in terms of the predicted frame number and clarity. In this paper, a depth learning model is proposed to predict future video frames. The model can predict video streams with complex pixel distributions of up to 32 frames. Our framework is mainly composed of two modules: a fusion image prediction generator and an image-video translator. The fusion picture prediction generator is realized by a U-Net neural network built by a 3D convolution, and the image-video translator is composed of a conditional generative adversarial network built by a 2D convolution network. In the proposed framework, given a set of fusion images and labels, the image picture prediction generator can learn the pixel distribution of the fitted label pictures from the fusion images. The image-video translator then translates the output of the fused image prediction generator into future video frames. In addition, this paper proposes an accompanying convolution model and corresponding algorithm for improving image sharpness. Our experimental results prove the effectiveness of this framework.

论文关键词：Video generation, Fusion image prediction generator, Image-video translator, 3D convolution, Conditional generative adversarial nets, 2D convolution network

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-021-02500-5