Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement

作者:

Highlights:

摘要

Multi-relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have been developed employing various search strategies, heuristics, language pattern limitations and hypothesis evaluation criteria, in order to cope with intractably large search space and to be able to generate high-quality patterns. In this work, we introduce an ILP-based concept discovery framework named Concept Rule Induction System (CRIS) which includes new approaches for search space pruning and new features, such as defining aggregate predicates and handling numeric attributes, for rule quality improvement. In CRIS, all target instances are considered together, which leads to construction of more descriptive rules for the concept. This property also makes it possible to use aggregate predicates more accurately in concept rule construction. Moreover, it facilitates construction of transitive rules. A set of experiments is conducted in order to evaluate the performance of proposed method in terms of accuracy and coverage.

论文关键词:ILP,Data mining,MRDM,Concept discovery,Transitive rules,Support,Confidence

论文评审过程:Received 4 February 2010, Revised 20 April 2010, Accepted 21 April 2010, Available online 28 April 2010.

论文官网地址:https://doi.org/10.1016/j.knosys.2010.04.011