Searching structured documents

作者:

Highlights:

摘要

Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author “Smith” in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for “Smith as author” in the same corpus.Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.

论文关键词:Structured information retrieval,Indexing and searching,Vector space,Boolean searching,SGML and XML

论文评审过程:Received 2 December 2002, Accepted 9 May 2003, Available online 7 June 2003.

论文官网地址:https://doi.org/10.1016/S0306-4573(03)00041-4