Canonical forms for labelled trees and their applications in frequent subtree mining

作者:Yun Chi, Yirong Yang, Richard R. Muntz

摘要

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labelled rooted unordered trees–the breadth-first canonical form (BFCF) and the depth-first canonical form (DFCF). Then the canonical forms are applied to the frequent subtree mining problem. Based on the BFCF, we develop a vertical mining algorithm, RootedTreeMiner, to discover all frequently occurring subtrees in a database of labelled rooted unordered trees. The RootedTreeMiner algorithm uses an enumeration tree to enumerate all (frequent) labelled rooted unordered subtrees. Next, we extend the definition of the DFCF to labelled free trees and present an Apriori-like algorithm, FreeTreeMiner, to discover all frequently occurring subtrees in a database of labelled free trees. Finally, we study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from real applications.

论文关键词:Canonical form, Frequent subtree, Labelled free tree, Labelled rooted unordered tree, Tree isomorphism

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-004-0180-7