Font and Function Word Identification in Document Recognition

作者：

Highlights：

•

摘要

An algorithm is presented that identifies the predominant font in which the running text in an English language document is printed. Frequent function words (such asthe,of,and,a, andto) are also recognized as part of the font identification. Clusters of word images are generated from an input document and matched to a database of function words derived from fonts and document images. The font or document that matches best provides the identification of the predominant font and function words. This technique takes advantage of the fact that most machine-printed documents are prepared with a single predominant font. Also, the repeated words in the document are utilized to overcome noise in the input. Advantages of this technique include its use as a preprocessing step for a document recognition algorithm. Experimental results show high accuracy is achieved on a database of original and degraded document images.

论文关键词：

论文评审过程：Received 4 November 1993, Accepted 3 October 1994, Available online 22 April 2002.

论文官网地址：https://doi.org/10.1006/cviu.1996.0005