A generalized line segmentation method for multi-script handwritten text documents

作者:

Highlights:

摘要

Handwritten document image segmentation into text-lines is a crucial stage towards unconstrained handwritten document recognition. In the context of Indian subcontinent various scripts are used for communication where a system for multi-script handwritten text line segmentation is very much essential. This paper presents a multi-script text line segmentation algorithm based on newly developed light projection, start point detection, and boundary tracking methods. The proposed approach is capable of overcoming most of the hindrance faced by state-of-the-art methods. The experiment is performed on our proposed Bangla handwritten document image dataset WBSUBNdb_text and also on a variety of well-known public handwritten datasets namely: CMATERdb, PhDIndic_11, KHATT, HIT-MW, ISI Bengali Writer Identification/Verification dataset, ICDAR 2013 segmentation contest dataset, ICDAR 2013 writer identification contest benchmark dataset, and obtained promising results.

论文关键词:Unconstrained handwriting,Light projection,Start point detection,Boundary tracking,Text line segmentation,Filling and smoothing

论文评审过程:Received 5 March 2021, Revised 25 October 2021, Accepted 8 August 2022, Available online 23 August 2022, Version of Record 7 September 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118498