Split, Embed and Merge: An accurate table structure recognizer

作者:

Highlights:

摘要

Table structure recognition is an essential part for making machines understand tables. Its main task is to recognize the internal structure of a table. However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand, especially for complex tables. In this paper, we introduce Split, Embed and Merge (SEM), an accurate table structure recognizer. SEM is mainly composed of three parts, splitter, embedder and merger. In the first stage, we apply the splitter to predict the potential regions of the table row/column separators, and obtain the fine grid structure of the table. In the second stage, by taking a full consideration of the textual information in the table, we fuse the output features for each table grid from both vision and text modalities. Moreover, we achieve a higher precision in our experiments through providing additional textual features. Finally, we process the merging of these basic table grids in a self-regression manner. The corresponding merging results are learned through the attention mechanism. In our experiments, SEM achieves an average F1-Measure of 97.11% on the SciTSR dataset which outperforms other methods by a large margin. We also won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing. Extensive experiments on other publicly available datasets further demonstrate the effectiveness of our proposed approach.

论文关键词:Table structure recognition,Self-regression,Attention mechanism,Encoder-decoder

论文评审过程:Received 11 August 2021, Revised 3 January 2022, Accepted 30 January 2022, Available online 1 February 2022, Version of Record 19 February 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108565