Relational Learning with Statistical Predicate Invention: Better Models for Hypertext

作者:Mark Craven, Seán Slattery

摘要

We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, whereas its relational component is able to describe how neighboring documents are related to each other by hyperlinks that connect them. We evaluate our approach by applying it to tasks that involve learning definitions for (i) classes of pages, (ii) particular relations that exist between pairs of pages, and (iii) locating a particular class of information in the internal structure of pages. Our experiments demonstrate that this new approach is able to learn more accurate classifiers than either of its constituent methods alone.

论文关键词:relational learning, text categorization, predicate invention, Naive Bayes

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007676901476