Timed-image based deep learning for action recognition in video sequences

作者:

Highlights:

• Image data conditioning issue: the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon image frames associated with a wide class of video files.

• Video library conditioning issue: because of the above compressibility, the paper proposes converting 2D + X data volume into a single meta-image file format called timed-image, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format.

• Sensitive action recognition benchmark: the paper provides two datasets having respectively 2 and 3 violence video categories. The datasets involve visual non-violent, moderate and extreme violence actions.

• Sensitive action recognition issue: outstanding 2-level and 3-level violence classification results are obtained from a deep convolutional neural networks trained from scratch and operating on meta-image databases.

摘要

•Image data conditioning issue: the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon image frames associated with a wide class of video files.•Video library conditioning issue: because of the above compressibility, the paper proposes converting 2D + X data volume into a single meta-image file format called timed-image, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format.•Sensitive action recognition benchmark: the paper provides two datasets having respectively 2 and 3 violence video categories. The datasets involve visual non-violent, moderate and extreme violence actions.•Sensitive action recognition issue: outstanding 2-level and 3-level violence classification results are obtained from a deep convolutional neural networks trained from scratch and operating on meta-image databases.

论文关键词:Data conditioning,Video analysis,Deep learning,Convolution frames,Hilbert space-filling curve,Action recognition,Violence detection

论文评审过程:Received 18 April 2019, Revised 26 March 2020, Accepted 29 March 2020, Available online 3 April 2020, Version of Record 11 May 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107353