An extended vector-processing scheme for searching information in hypertext systems

摘要

When searching information in a hypertext is limited to navigation, it is not an easy task, especially when the number of nodes and/or links becomes very large. A query-based access mechanism must be therefore provided to complement the navigational tools inherent in hypertext systems. Most mechanisms currently proposed are based on conventional information retrieval models which consider documents as independent entities, and ignore hypertext links. To promote the use of other information retrieval mechanisms adapted to hypertext systems, this study attempts to respond to the following questions: (1) How can we integrate information given by hypertext links into an information retrieval scheme? (2) Are these hypertext links (and link semantics) clues to the enhancement of retrieval effectiveness? (3) If so, how can we use them? Two solutions are: (a) using a default weight function based on link type or assigning the same strength to all link types; or (b) using a specific weight for each particular link, i.e. the level of association or a similarity measure. This study proposes an extended vector-processing scheme which extracts additional information from hypertext links to enhance retrieval effectiveness. To carry out our investigations, we have built a hypertext based on two medium-size collections, the cacm and the cisi collection. The hypergraph is composed of explicit links (bibliographic references), computed links based on bibliographic information (bibliographic coupling, co-citation), or on hypertext links established according to document representatives (nearest neighbor).