Research on data consistency detection method based on interactive matching under sampling background

作者:

Highlights:

摘要

Multisource data are a common phenomenon in the era of big data. Detecting the consistency of multisource data is a basic problem in decision-making, which is widely contemplated in academic and applied fields. In this paper, the consistency detection between big datasets is mainly conducted as follows: (1) With If-then classification rules as the core concern of the dataset, decision trees as the acquisition approach of the rules, and the interactive matching of classification rules as the carrier, the measurement model of interactive matching under the background of random sampling (SB-IMM in short) was established. (2) The performance analysis of the SB-IMM is analyzed by combining the law of large numbers and several common UCI datasets. 3) The application of the SB-IMM in public policy-making is discussed by taking the consistency between the middle and east data of the CHFS dataset as an example. A theoretical analysis and the experimental results show that the SB-IMM has good structural characteristics and interpretability, which can provide theoretical support for the processing of big data and has wide prospects for application.

论文关键词:Interactive matching,Data consistency,Decision tree,Classification rules,Matching accuracy,Repeated sampling

论文评审过程:Received 20 May 2022, Revised 10 August 2022, Accepted 11 August 2022, Available online 19 August 2022, Version of Record 1 September 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109695