Bayesian analysis of GUHA hypotheses

作者:Robert Piché, Marko Järvenpää, Esko Turunen, Milan Šimůnek

摘要

The LISp-Miner system for data mining and knowledge discovery uses the GUHA method to comb through a large data base and finds 2 × 2 contingency tables that satisfy a certain condition given by generalised quantifiers and thereby suggest the existence of possible relations between attributes. In this paper, we show how a more detailed interpretation of the data in the tables that were found by GUHA can be obtained using Bayesian statistical methods. Using a multinomial sampling model and Dirichlet prior, we derive posterior distributions for parameters that correspond to GUHA generalised quantifiers. Examples are presented illustrating the new Bayesian post-processing tools implemented in LISp-Miner. A statistical model for the analysis of contingency tables for data from two subpopulations is also presented.

论文关键词:Data mining, GUHA, Contingency table, Bayesian statistics

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-013-0255-6