Out of vocabulary word detection and recovery in Arabic handwritten text recognition

作者:

Highlights:

• A novel two-step OOV words detection and recovery method is proposed.

• The proposed method is generic and independent of the recognition engine.

• The proposed method uses various sub-lexical modeling to improve the detection step.

• The recovery process relies on dynamic lexicons built from large text corpora.

• The proposed method significantly improves the recognition results.

摘要

•A novel two-step OOV words detection and recovery method is proposed.•The proposed method is generic and independent of the recognition engine.•The proposed method uses various sub-lexical modeling to improve the detection step.•The recovery process relies on dynamic lexicons built from large text corpora.•The proposed method significantly improves the recognition results.

论文关键词:Arabic Handwriting recognition,Out of vocabulary detection and recovery,Static lexicon,Dynamic lexicon,Statistical language model,Deep learning,Multi-dimensional long short term memory network

论文评审过程:Received 10 July 2018, Revised 18 April 2019, Accepted 1 May 2019, Available online 1 May 2019, Version of Record 10 May 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.05.003