Field data extraction for form document processing using a gravitation-based algorithm

作者:

Highlights:

摘要

This paper presents a novel approach to grouping Chinese handwritten field data filled in form documents using a gravitation-based algorithm. An algorithm is developed to extract handwritten field data which may be written out of form fields. First, form lines are extracted and removed from input form images. Connected-components are then detected from remaining data, and the gravitation for each connected-component is computed by using the black pixel counts as their mass. Next, we move connected-components according to their gravitation. As generally known, filled-in data have the locality property, i.e., data of the same field are normally written in a local area consecutively. Therefore, the relationship of these connected-components can be determined by this property. Repeatedly moving these connected-components according to their neighbor components allows us to determine which connected-components should be extracted for a particular field. Experimental results demonstrate the effectiveness of the proposed method in grouping field data.

论文关键词:Form document processing,Field-data grouping,Gravitation-based algorithm,Connected-component,Locality property

论文评审过程:Received 6 June 1999, Revised 24 April 2000, Accepted 20 June 2000, Available online 10 July 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(00)00115-1