Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation

作者:

Highlights:

摘要

Most of the post-processing methods for character recognition rely on contextual information of character and word-fragment levels. However, due to linguistic characteristics of Korean, such low-level information alone is not sufficient for high-quality character-recognition applications, and we need much higher-level contextual information to improve the recognition results. This paper presents a domain independent post-processing technique that utilizes multi-level morphological, syntactic, and semantic information as well as character-level information. The proposed post-processing system performs three-level processing: candidate character-set selection, candidate eojeol (Korean word) generation through morphological analysis, and final single eojeol-sequence selection by linguistic evaluation. All the required linguistic information and probabilities are automatically acquired from a statistical corpus analysis. Experimental results demonstrate the effectiveness of our method, yielding an error correction rate of 80.46%, and improved recognition rate of 95.53% from the before-post-processing rate of 71.2% for single best-solution selection.

论文关键词:Korean character recognition,Post-processing,Morphological analysis,Part-of-speech tagging,Co-occurrence patterns,Linguistic evaluation

论文评审过程:Received 16 July 1996, Revised 1 August 1996, Revised 15 October 1996, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(96)00156-2