First steps toward CNN based source classification of document images shared over messaging app

Highlights：

• First of its kind method to classify the source of document images that have undergone multimedia-messaging platform’s processing.

• Smartphone Doc Dataset: We introduce and analyze the performance of the proposed method on a new dataset comprising 770 images of text documents printed in three different fonts and captured using 22 smartphones.

• The proposed method overcomes the scramble for designing suitable handcrafted features with the help of a pre-CNN-fusion strategy and outperforms baseline methods.

• The proposed CNN based method also outperforms baseline methods in classifying source smartphones of document images which have undergone rescaling attack before being shared over WhatsApp.

摘要

•First of its kind method to classify the source of document images that have undergone multimedia-messaging platform’s processing.•Smartphone Doc Dataset: We introduce and analyze the performance of the proposed method on a new dataset comprising 770 images of text documents printed in three different fonts and captured using 22 smartphones.•The proposed method overcomes the scramble for designing suitable handcrafted features with the help of a pre-CNN-fusion strategy and outperforms baseline methods.•The proposed CNN based method also outperforms baseline methods in classifying source smartphones of document images which have undergone rescaling attack before being shared over WhatsApp.

论文评审过程：Received 12 October 2018, Revised 29 May 2019, Accepted 30 May 2019, Available online 3 June 2019, Version of Record 13 June 2019.