Skew detection and block classification of printed documents

作者:

Highlights:

摘要

Since the number of daily-received paper-based office documents is overwhelming, the development of document image analysis, which converts the paper-based documents into electronic forms becomes increasingly important. This paper describes a skew detection method which first smoothes the black runs and locates the black–white transitions to emphasize the text lines. Then the skew angle is determined by an improved Hough transform. For the block classification step, a rule-based classifier is presented. The classification rules are derived from the gray level entropy, block aspect ratio, and run length analysis. To evaluate the performance of the proposed methods, a test set of 100 different documents is used. The results of the experiments reveal that all of the 100 documents are successfully skew-corrected and the precision rate and the recall rate of the proposed block classifier are satisfactory.

论文关键词:Block classification,Document analysis,Hough transform,Skew detection

论文评审过程:Received 6 December 1999, Revised 17 November 2000, Accepted 3 December 2000, Available online 24 July 2001.

论文官网地址:https://doi.org/10.1016/S0262-8856(00)00098-6