Improving information retrieval by combining user profile and document segmentation

作者:

Highlights:

摘要

Due to the ever-increasing quantity of available information, which users have to scan in order to find relevant items, noise has become a major issue in the implementation and use of information retrieval systems. The aim of this study was to design an information retrieval system permitting the “personalization” of search, by taking into account user profile. A pre-orientation system was first developed to give access to a personalized subcorpus. To limit noise in information retrieval systems, the textual material offered to the user is reduced and contains only those sections (units) of the document that interest him and are significant to him (where textual material is used in the sense of document units to be processed by content analysis in order to build descriptions of the documents). In this way, the documents are structured on the basis of utility functions. The selected document units are part of the sub-corpus defined by the pre-orientation system. Next, the profile of each user is characterized by determining competence in a given field and at different levels. Each user is characterized by: •-stable information, related to the person rather than to a particular search. This information provides a general description of the user and his habits,•-variable information, related to a specific search. The priority here is to describe the objective of the search (search may be either exhaustive or non-exhaustive; it may concern specialized or popular publications, etc.).The function of the pre-orientation system is to associate a set of characteristics applying to document units to a given user profile. Search is then applied only to the subset of the selected document units that are relevant to the user and established following his profile. Document units are not characterized on the basis of thematic criteria related to content, but rather on the basis of criteria relating to utility. The objective was to propose a hypothesis on the different parameters determining user profile and document unit characteristics, and to test such a hypothesis using an existing information retrieval system incorporating full-text natural language processing tools.

论文关键词:User profile,Utility criteria,End-user,Full-text database,Evaluation,Noise limitation,Segmentation of text

论文评审过程:Available online 23 February 1999.

论文官网地址:https://doi.org/10.1016/0306-4573(95)00062-3