Efficient discovery of correlated patterns using multiple minimum all-confidence thresholds

作者:Uday Kiran Rage, Masaru Kitsuregawa

摘要

Correlated patterns are an important class of regularities that exist in a database. Although there exists no universally acceptable best measure to judge the interestingness of a pattern, all-confidence is emerging as a popular measure to discover the patterns. It is because the measure satisfies both the anti-monotonic and null-invariance properties. The former property makes the pattern mining practicable in real-world applications. The latter property facilitates the user to discover the patterns involving both frequent and rare items without generating the huge number of patterns. In this paper, we show that though the measure satisfies the null-invariance property, mining the patterns containing both frequent and rare items with a single minimum all-confidence (minAllConf) threshold leads to the dilemma known as “rare item problem.” At a high minAllConf, the discovered correlated patterns involving rare items have very short length. At a low minAllConf, combinatorial explosion can occur, producing too many patterns. To confront the problem, the paper introduces an alternative model based on the concept of multiple minAllConf thresholds. The proposed model generalizes the existing model of correlated patterns and facilitates the user to specify a different minAllConf for each pattern depending upon its items’ frequencies. A pattern-growth algorithm, called GCoMine, has also been proposed to discover the patterns. Experiment results show that GCoMine is efficient, and the proposed model can address the problem effectively.

论文关键词:Data mining, Knowledge discovery in databases, Correlated patterns, Rare item problem

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-014-0314-7