Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark

作者:

Highlights:

• Supervised classification on large scale and high dimensional data.

• Adaptation of a well known approach such as Bayesian Network Classifiers to MapReduce.

• Deep analysis of scalabilty properties of the new methods.

• Extensive experimentation on several synthetic and real datasets.

• Implementation on state-of-the-art Apache Spark library, open source code available.

摘要

•Supervised classification on large scale and high dimensional data.•Adaptation of a well known approach such as Bayesian Network Classifiers to MapReduce.•Deep analysis of scalabilty properties of the new methods.•Extensive experimentation on several synthetic and real datasets.•Implementation on state-of-the-art Apache Spark library, open source code available.

论文关键词:Bayesian Network Classifiers,MapReduce,Big Data,High dimensionality,Apache Hadoop,Apache Spark

论文评审过程:Received 17 March 2016, Revised 9 June 2016, Accepted 12 June 2016, Available online 22 June 2016, Version of Record 20 December 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.06.013