Effects of OCR errors on ranking and feedback using the vector space model

作者:

Highlights:

摘要

We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.

论文关键词:

论文评审过程:Available online 23 February 1999.

论文官网地址:https://doi.org/10.1016/0306-4573(95)00058-5