RECOGNITION AND DATA EXTRACTION OF FORM DOCUMENTS BASED ON THREE TYPES OF LINE SEGMENTS

作者:

Highlights:

摘要

Almost all form documents contain line segments. In this paper, we propose an efficient method to recognize the form document that contains at least one line segment. Our method is based on an efficient representation model of the form. The representation model uses three types of line segments to represent a form. All line segments are normalized and sorted after they were extracted. The normalization and sorting not only solve the form scaling problem but also provide an unified and efficient way of matching between forms. To make the recognition method more robust, a fuzzy matching is used. Using the representation model, when recognizing a skew form, only the line segments and the data fields instead of the whole form image need to be rotated. Experimental results show the effectiveness and the efficiency of the method.

论文关键词:Automatic form processing,Document analysis,Representation model for forms,Form recognition,Data extraction

论文评审过程:Received 21 August 1997, Revised 24 December 1997, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(98)00007-7