On the generalization of the form identification and skew detection problem

作者:

Highlights:

摘要

A new method is proposed to solve the document identification and skew detection problem. It can be applied to a widely used subclass of documents which resemble in style an application form. Unlike other approaches, we make no assumptions about the nature and/or style of the printed form. An attempt is made to solve the problem in the most general sense. The method presented here does not rely on any special features such as patterns of line crossings, or dominant lines, or even special symbols found only on specially designed forms. The Power Spectral Density of the horizontal projection profile of the form is used as a shift invariant feature vector. The Karhunen–Loeve transform is employed to de-correlate and reduce the length of the feature vectors in the training set. Training is done in such a way that no rotations of the unknown form are necessary during recognition. The eigenvectors of the covariance matrix of the power spectral densities for the training set, along with learning vector quantization, were used for training, and the Euclidean distance, for recognition. A limitation related to the amount of skew that the system can handle is alleviated with the use of a known skew detection method.

论文关键词:Form identification,Skew detection,Shift detection,Power spectrum,Karhunen–Loeve transformation,Learning vector quantization

论文评审过程:Received 11 June 1999, Accepted 5 January 2001, Available online 17 October 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(01)00030-9