An analysis of ill-formed input in natural language queries to document retrieval systems

作者：

Highlights：

•

摘要

We analyzed natural language document retrieval queries from the Thomas Cooper Library at the University of South Carolina in order to investigate the frequency of various types of ill-formed input, such as spelling errors, co-occurrence violations, conjunctions, ellipsis and missing or incorrect punctuation. The primary reason for analyzing ill-formed inputs was to determine whether there is a significant need to study ill-formed inputs in detail. After analyzing the queries, we found that most of the queries were sentence fragments and that many of them contained some type of ill-formed input. Conjunctions caused the most problems. The next most serious problem was caused by punctuation errors. Spelling errors occurred in a small number of the queries. The remaining types of ill-formed input considered, ellipsis and co-occurrence violations, were not found in the queries.

论文关键词：

论文评审过程：Received 17 September 1990, Accepted 17 May 1991, Available online 13 July 2002.

论文官网地址：https://doi.org/10.1016/0306-4573(91)90002-4