Hierarchical Independence Thresholding for learning Bayesian network classifiers

作者：

Highlights：

•

摘要

Bayesian networks are powerful tools for knowledge representation and inference under conditions of uncertainty. However, learning an optimal Bayesian network classifier (BNC) is an NP-hard problem since its topology complexity increases exponentially with the number of attributes. Researchers proposed to apply information-theoretic criteria to measure conditional dependence, and independence assumptions are introduced implicitly or explicitly to simplify the network topology of BNC. In this paper, we clarify the mapping relationship between conditional mutual information and local topology, and then illustrate that informational independence does not correspond to probabilistic independence, the criterion of probabilistic independence does not necessarily hold for the independence topology. A novel framework of semi-naive Bayesian operation, called Hierarchical Independence Thresholding (HIT), is presented to efficiently identify informational conditional independence and probabilistic conditional independence by applying an adaptive thresholding method, redundant edges will be filtered out and the learned topology will fit the data better. Extensive experimental evaluation on 58 publicly available datasets reveals that when HIT is applied to BNCs (such as tree augmented Naive Bayes or k-dependence Bayesian classifier), the final BNCs achieve competitive classification performance compared to state-of-the-art learners such as Random Forest and Logistic regression.

论文关键词：00-01,99-00,Bayesian network,Hierarchical independence thresholding,Informational independence,Probabilistic independence,Adaptive thresholding

论文评审过程：Received 22 June 2020, Revised 21 November 2020, Accepted 24 November 2020, Available online 25 November 2020, Version of Record 28 November 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2020.106627