A novel ensemble learning approach to unsupervised record linkage

作者:

Highlights:

• A novel unsupervised approach to record linkage has been proposed.

• The approach combines ensemble learning and automatic self learning.

• An ensemble of diverse self learning models is generated through application of different string similarity metrics schemes.

• Application of ensemble learning alleviates the problem of having to select the most suitable similarity metric scheme and improves the performance of an individual self learning model.

• The proposed method obtained comparable results with the supervised methods.

摘要

•A novel unsupervised approach to record linkage has been proposed.•The approach combines ensemble learning and automatic self learning.•An ensemble of diverse self learning models is generated through application of different string similarity metrics schemes.•Application of ensemble learning alleviates the problem of having to select the most suitable similarity metric scheme and improves the performance of an individual self learning model.•The proposed method obtained comparable results with the supervised methods.

论文关键词:Unsupervised record linkage,Data matching,Classification,Ensemble learning

论文评审过程:Received 21 October 2016, Revised 25 June 2017, Accepted 27 June 2017, Available online 28 June 2017, Version of Record 12 July 2017.

论文官网地址:https://doi.org/10.1016/j.is.2017.06.006