Processing of binary images of handwritten text documents

作者:

Highlights:

摘要

This paper deals with three different problems in the processing of binary images of handwritten text documents. Firstly, an integrated algorithm that finds a straight line approximation of a textual stroke is described. It has the advantage of using the distance transform of thinned binary images to identify spurious bifurcation points, which are unavoidable when thinning algorithms are used, remove them and recover the original ones. The obtained straight line approximations preserve the structural information of the original pattern. The algorithm does not resort to distortable geometrical properties. Secondly, a method is presented to recover loops that become blobs due to blotting. The method depends on removing the pixels whose distance transform exceeds a calculated threshold. Unfortunately, it seems that it is not possible to recover such loops with a high rate of success. The authors suggest that the inclusion of thickness information, in the line segments that connect the vertices of the straight line approximations produced by the previous algorithm, is a step towards a solution of this problem. Finally, a method is developed to extract lines from pages of handwritten text, by finding the shortest spanning tree of a graph formed from the set of main strokes. Then, main strokes of extracted lines are arranged in the same order as they were written by following the path in which they are contained. Then, every secondary stroke is assigned to the closest main stroke. At the end, an ordered list of main strokes, each with the corresponding number of assigned secondary strokes, is obtained. Each combination of main-secondary strokes can be the input to a subsequent recognition stage. The method proved to be powerful and more suited to variable handwriting.

论文关键词:Handwritten text,Printed text,Main stroke,Secondary stroke,Thinning,Straight line approximation,Distance transform,Blotting,Blobs,Cost matrix,Shortest spanning tree

论文评审过程:Received 29 September 1994, Revised 8 September 1995, Accepted 16 October 1995, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/0031-3203(95)00142-5