Batch-adaptive rejection threshold estimation with application to OCR post-processing

作者:

Highlights:

• OCR strings are post-processed to comply with language constraints.

• A user-defined target error rate is met on a batch of OCR post-processed strings.

• The rejection threshold is automatically estimated from the target error rate.

• The threshold can be estimated for language models using real or synthetic samples.

• Accurate error rate estimation on test samples having different confidence distributions.

摘要

•OCR strings are post-processed to comply with language constraints.•A user-defined target error rate is met on a batch of OCR post-processed strings.•The rejection threshold is automatically estimated from the target error rate.•The threshold can be estimated for language models using real or synthetic samples.•Accurate error rate estimation on test samples having different confidence distributions.

论文关键词:Rejection threshold,OCR post-processing,Language models,Weighted finite-state transducers,Error vs. cost curve,Cumulative error vs. cost curve,OCR error-generation model

论文评审过程:Available online 24 June 2015, Version of Record 13 July 2015.

论文官网地址:https://doi.org/10.1016/j.eswa.2015.06.022