CONFIRM – Clustering of noisy form images using robust matching

作者:

Highlights:

• A clustering framework is proposed for clustering noisy form images.

• Novel algorithms for matching text lines and rule lines are introduced.

• We show 44% improvement over the state-of-the-art on 5 datasets of historical forms.

• Sampling and bootstrapping is employed for scalability to large datasets.

摘要

•A clustering framework is proposed for clustering noisy form images.•Novel algorithms for matching text lines and rule lines are introduced.•We show 44% improvement over the state-of-the-art on 5 datasets of historical forms.•Sampling and bootstrapping is employed for scalability to large datasets.

论文关键词:Form processing,Document analysis,Document image clustering,Historical document processing,Clustering

论文评审过程:Received 2 March 2017, Revised 3 July 2018, Accepted 7 October 2018, Available online 8 October 2018, Version of Record 12 October 2018.

论文官网地址:https://doi.org/10.1016/j.patcog.2018.10.004