Using Consensus Sequence Voting to Correct OCR Errors

作者:

Highlights:

摘要

We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).

论文关键词:

论文评审过程:Received 23 January 1995, Accepted 3 January 1996, Available online 19 April 2002.

论文官网地址:https://doi.org/10.1006/cviu.1996.0502