Storing and retrieving word phrases

作者:

Highlights:

摘要

We have developed methods for storing and retrieving large dictionaries of word pairs and other multi-word phrases based on hashed indexing. From analysis of text samples we have derived Zipfian laws for the frequency distributions of word pairs and longer phrases. We show where these Zipfian curves cross and deduce that the number of multi-word phrases which occur frequently in text is surprisingly small, of the same order of magnitude as the number of individual word-types in a text. Dictionaries of phrases are therefore amenable to fast processing with modest computer equipment. Finally, we suggest that in stylistic analysis word phrases might better discriminate between authors than do single words.

论文关键词:

论文评审过程:Received 28 September 1984, Available online 18 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(85)90106-2