Research on case retrieval of Bayesian network under big data

作者:

Highlights:

摘要

Although case retrieval of Bayesian network has greatly promoted the application of CBR technique in engineering fields, it is facing huge challenges with the arrival of the era of big data. First, huge computation task of BN learning caused by big data seriously hampers the efficiency of case retrieval; Second, with the increasing data size, the accuracy of case retrieval becomes poorer and poorer because existing methods of improving probability learning become unfit for new situation. Aiming at the first problem, this paper proposes Within-Cross algorithm to assign computation task to improve the result of parallel data processing and gain better efficiency of case retrieval. For the second problem, this paper proposes a new method called Weighted Index Coefficient of Dirichlet Distribution (WICDD) algorithm, which first measures the influence of different factors on probability learning and then gives a weight to each super parameter of Dirichlet Distribution to adjust the result of probability learning. Thus with WICDD algorithm, the effect of probability learning is greatly improved, which then further enhances the accuracy of case retrieval. Finally, lots of experiments are executed to validate the effectiveness of the proposed method.

论文关键词:Case retrieval,Big data,BN model,Hadoop platform

论文评审过程:Received 11 February 2018, Revised 5 August 2018, Accepted 16 August 2018, Available online 20 August 2018, Version of Record 19 November 2018.

论文官网地址:https://doi.org/10.1016/j.datak.2018.08.002