Cloud based scalable object recognition from video streams using orientation fusion and convolutional neural networks

Highlights：

• This paper pioneers the use of empirical mode decomposition with CNNs, to improve visual object recognition accuracy on challenging video datasets.

• We study the orientation, phase and amplitude components and show their performance in terms of visual recognition accuracy.

• We show that the orientation component is a good candidate to achieve high object recognition accuracy, for illumination- and expression-variant video datasets.

• We propose a feature-fusion strategy of the orientation components to further improve the accuracy rates.

• We show that the orientation-fusion approach significantly improves the visual recognition accuracy, under challenging conditions.

摘要

•This paper pioneers the use of empirical mode decomposition with CNNs, to improve visual object recognition accuracy on challenging video datasets.•We study the orientation, phase and amplitude components and show their performance in terms of visual recognition accuracy.•We show that the orientation component is a good candidate to achieve high object recognition accuracy, for illumination- and expression-variant video datasets.•We propose a feature-fusion strategy of the orientation components to further improve the accuracy rates.•We show that the orientation-fusion approach significantly improves the visual recognition accuracy, under challenging conditions.

论文关键词：Scalable video anaytics,Feature fusion,Object orientation,Object recognition,Convolutional neural networks,Cloud-based video analytics

论文评审过程：Received 14 September 2020, Revised 10 February 2021, Accepted 1 March 2021, Available online 27 July 2021, Version of Record 12 August 2021.