Redescription mining augmented with random forest of multi-target predictive clustering trees

作者:Matej Mihelčić, Sašo Džeroski, Nada Lavrač, Tomislav Šmuc

摘要

In this work, we present a redescription mining algorithm that uses Random Forest of Predictive Clustering Trees (RFPCTs) for generating and iteratively improving a set of redescriptions. The approach uses information about element membership in different queries, generated from a single constructed PCT, to explore redescription space, while queries obtained from the Random Forest of PCTs increase candidate diversity. The approach is able to produce highly accurate, statistically significant redescriptions described by Boolean, nominal or numerical attributes. As opposed to current tree-based approaches that use multi-class or binary classification, we explore the benefits of using multi-label classification and multi-target regression to create redescriptions. Major benefit of the approach, compared to other state of the art solutions, is that it does not require specifying minimal threshold on redescription accuracy to obtain highly accurate, optimized set of redescriptions. The process of Random Forest based augmentation and different modes of redescription set creation are evaluated on three datasets with different properties. We use the same datasets to compare the performance of our algorithm to state of the art redescription mining approaches.

论文关键词:Knowledge discovery, Redescription mining, Random forest, Predictive clustering trees, World countries, Computer science bibliography, Bioclimatic niches

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-017-0448-5