RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

作者:Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti

摘要

Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework called Rain Forest for classification tree construction that separates the scalability aspects of algorithms for constructing a tree from the central features that determine the quality of the tree. The generic algorithm is easy to instantiate with specific split selection methods from the literature (including C4.5, CART, CHAID, FACT, ID3 and extensions, SLIQ, SPRINT and QUEST).

论文关键词:data mining, decision trees, classification, scalability

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1009839829793