A SCRIPT-INDEPENDENT METHODOLOGY FOR OPTICAL CHARACTER RECOGNITION

作者:

Highlights:

摘要

We present a methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented. The methodology is adapted to OCR from continuous speech recognition, which has developed a mature and successful technology based on Hidden Markov Models. The script independence of the methodology is demonstrated using omnifont experiments on the DARPA Arabic OCR Corpus and the University of Washington English Document Image Database I.

论文关键词:Optical character recognition,Speech recognition,Hidden Markov models,Segmentation-free recognition,Script independence,Arabic OCR

论文评审过程:Received 26 February 1997, Revised 15 September 1997, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(97)00152-0