Using structural similarity for clustering XML documents

作者:Ali Aïtelhadj, Mohand Boughanem, Mohamed Mezghiche, Fatiha Souam

摘要

In this paper, we describe a method for clustering XML documents. Its goal is to group documents sharing similar structures. Our approach is two-step. We first automatically extract the structure from each XML document to be classified. This extracted structure is then used as a representation model to classify the corresponding XML document. The idea behind the clustering is that if XML documents share similar structures, they are more likely to correspond to the structural part of the same query. Finally, for the experimentation purpose, we tested our algorithms on both real (ACM SIGMOD Record corpus) and synthetic data. The results clearly demonstrate the interest of our approach.

论文关键词:Clustering, Context, Node, Similarity, Structural classification, Threshold, Tree

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-011-0421-5