Computing contingency tables from sparse ADtrees

作者:Fei Ding, Yi Zhuang

摘要

In data-mining algorithms contingency tables are frequently built from ADtrees, as ADtrees have been demonstrated to be an efficient data structure for caching sufficient statistics. This paper introduces three modifications. The first two use a one-dimensional array and a hash map for representing contingency tables, and the third uses the non-recursive approach to build contingency tables from sparse ADtrees. We implement algorithms to construct contingency tables with a two-dimensional array, a tree, a one-dimensional array, and a hash map using recursion and non-recursive approaches in Python. We empirically test these algorithms in five aspects with a large number of randomly generated datasets. We also apply the modified algorithms to Bayesian networks learning and test the performance improvements using three real-life datasets. We demonstrate experimentally that all three of these modifications improve algorithm performance. The improvements are more significant with higher arities and larger arity values.

论文关键词:Data mining, Contingency tables, Sparse ADtrees, Non-recursive algorithms, Bayesian networks learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-014-0624-z