Three-dimensional Entity Resolution with JedAI

作者:

Highlights:

• JedAI is an open-source system that allows for composing the state-of-the-art individual methods into millions of end-to-end workflows for Entity Resolution.

• JedAI supports both batch and progressive Entity Resolution.

• JedAI supports both blocking-based and join-based Entity Resolution.

• JedAI supports both serialized and massively parallel execution (on top of Apache Spark).

• JedAI achieves comparable effectiveness to the state-of-the-art (supervised) ER tools at a significantly lower running time.

摘要

•JedAI is an open-source system that allows for composing the state-of-the-art individual methods into millions of end-to-end workflows for Entity Resolution.•JedAI supports both batch and progressive Entity Resolution.•JedAI supports both blocking-based and join-based Entity Resolution.•JedAI supports both serialized and massively parallel execution (on top of Apache Spark).•JedAI achieves comparable effectiveness to the state-of-the-art (supervised) ER tools at a significantly lower running time.

论文关键词:Entity Resolution,Blocking,Matching,Clustering,Batch methods,Progressive methods,Massive parallelization

论文评审过程:Received 6 May 2020, Revised 22 May 2020, Accepted 23 May 2020, Available online 27 May 2020, Version of Record 28 May 2020.

论文官网地址:https://doi.org/10.1016/j.is.2020.101565