Incremental mining of information interest for personalized web scanning

作者:

Highlights:

摘要

Businesses and people often organize their information of interest (IOI) into a hierarchy of folders (or categories). The personalized folder hierarchy provides a natural way for each of the users to manage and utilize his/her IOI (a folder corresponds to an interest type). Since the interest is relatively long-term, continuous web scanning is essential. It should be directed by precise and comprehensible specifications of the interest. A precise specification may direct the scanner to those spaces that deserve scanning, while a specification comprehensible to the user may facilitate manual refinement, and a specification comprehensible to information providers (e.g. Internet search engines) may facilitate the identification of proper seed sites to start scanning. However, expressing such specifications is quite difficult (and even implausible) for the user, since each interest type is often implicitly and collectively defined by the content (i.e. documents) of the corresponding folder, which may even evolve over time. In this paper, we present an incremental text mining technique to efficiently identify the user's current interest by mining the user's information folders. The specification mined for each interest type specifies the context of the interest type in conjunctive normal form, which is comprehensible to general users and information providers. The specification is also shown to be more precise in directing the scanner to those sites that are more likely to provide IOI. The user may thus maintain his/her folders and then constantly get IOI, without paying much attention to the difficult tasks of interest specification and seed identification.

论文关键词:Context of information interest,Precise interest specifications,Comprehensible interest specifications,Incremental text mining,Web scanning

论文评审过程:Received 7 February 2003, Revised 13 May 2004, Accepted 5 July 2004, Available online 12 August 2004.

论文官网地址:https://doi.org/10.1016/j.is.2004.07.001