JointMatcher: Numerically-aware entity matching using pre-trained language models with attention concentration

作者:

Highlights:

• The pitfalls overlooked by existing pre-trained LM-based EM methods are identified.

• A novel pre-trained LM-based EM model JointMatcher is developed.

• Two encoders are proposed to pay more attention to the important segments of the input record pair.

• Experimental results show JointMatcher achieves good performance under limited training data.

摘要

•The pitfalls overlooked by existing pre-trained LM-based EM methods are identified.•A novel pre-trained LM-based EM model JointMatcher is developed.•Two encoders are proposed to pay more attention to the important segments of the input record pair.•Experimental results show JointMatcher achieves good performance under limited training data.

论文关键词:Entity matching,Pre-trained language model,Attention concentration

论文评审过程:Received 11 June 2021, Revised 8 May 2022, Accepted 9 May 2022, Available online 16 May 2022, Version of Record 17 June 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109033