Document zone content classification and its performance evaluation

作者:

Highlights:

摘要

This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision tree classifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed. The training and testing data sets include a total of 24,177 zones from the University of Washington English Document Image database III. The algorithm accuracy is 98.45% with a mean false alarm rate of 0.50%.

论文关键词:Pattern recognition,Document image analysis,Document layout analysis,Zone content classification,Background analysis,Decision tree classifier,Viterbi algorithm

论文评审过程:Received 24 November 2004, Revised 14 June 2005, Accepted 14 June 2005, Available online 15 August 2005.

论文官网地址:https://doi.org/10.1016/j.patcog.2005.06.009