Mining Text Using Keyword Distributions

作者:Ronen Feldman, Ido Dagan, Haym Hirsh

摘要

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.

论文关键词:data mining, text mining, text categorization, distribution comparison, trend analysis

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1008623632443