Document retrieval tolerating character recognition errors—evaluation and application

作者：

Highlights：

•

摘要

This paper presents two methods of combining character recognition with techniques for retrieving Japanese documents and also shows how these methods can be applied to textual image retrieval. Both retrieval methods are tolerant of errors that occur during the character recognition process. The basic idea is to utilize the characteristics of recognition errors. One uses a confusion matrix to generate “equivalent” query strings that should match erroneously recognized text. The other one searches a “non-deterministic text” that contains multiple candidates for ambigous recognition results. Simulation experiments have shown that both methods can effectively combine character recognition with retrieval techniques.

论文关键词：Japanese,Character recognition,Document retrieval,Recognition error,Confusion matrix,Extended query-term method,Non-deterministic text,Multiple-candidate method

论文评审过程：Received 16 July 1996, Available online 7 June 2001.

论文官网地址：https://doi.org/10.1016/S0031-3203(96)00155-0