Text baseline detection, a single page trained system

作者：

Highlights：

• A fast and robust to nosily manuscripts, text baseline detection method is proposed.

• The local minima of the text contours are considered as interest points.

• An Extremely Randomized Trees forest is used to classify the interest points.

• A modified version of Dbscan is used to cluster these points into baselines.

• Roughly, 3 s on average to automatically estimate the baselines of each page.

摘要

•A fast and robust to nosily manuscripts, text baseline detection method is proposed.•The local minima of the text contours are considered as interest points.•An Extremely Randomized Trees forest is used to classify the interest points.•A modified version of Dbscan is used to cluster these points into baselines.•Roughly, 3 s on average to automatically estimate the baselines of each page.

论文关键词：

论文评审过程：Received 25 October 2017, Revised 9 May 2019, Accepted 18 May 2019, Available online 28 May 2019, Version of Record 29 May 2019.

论文官网地址：https://doi.org/10.1016/j.patcog.2019.05.031