Smoothing and compression with stochastic k-testable tree languages

作者:

Highlights:

摘要

In this paper, we describe some techniques to learn probabilistic k-testable tree models, a generalization of the well-known k-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.

论文关键词:Tree grammars,Stochastic models,Smoothing,Backing-off,Data compression

论文评审过程:Accepted 17 March 2004, Available online 2 April 2005.

论文官网地址:https://doi.org/10.1016/j.patcog.2004.03.024