CLASSIFICATION OF MACHINE-PRINTED AND HANDWRITTEN TEXTS USING CHARACTER BLOCK LAYOUT VARIANCE

作者:

Highlights:

摘要

Machine-printed and handwritten texts always intermixedly appear in several kinds of documents, such as form documents. The classification of machine-printed and handwritten texts is thus a prerequisite to facilitate later optical character recognition task. In this paper, we will present a machine-printed and handwritten text classification method to automatically identify the identity of texts segmented from a document image. In our approach, the orientation of a text block is first divided into horizontal or vertical direction by analyzing the widths of valleys of X and Y projection profiles of a text block image. Then, a reduced X–Y cut algorithm is utilized to obtain the base blocks from a text block image. Last, the spatial feature, character block layout variance, is devised to achieve the classification goal. Our method can be applied to either English or Chinese document images. Experimental results reveal the feasibility of our proposed method in classifying handwritten and machine-printed texts.

论文关键词:Document analysis,Optical character recognition,Projection profile,Character block layout variance

论文评审过程:Received 31 March 1997, Revised 21 October 1997, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(97)00143-X