Similarity-Based Models of Word Cooccurrence Probabilities

作者:Ido Dagan, Lillian Lee, Fernando C. N. Pereira

摘要

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and ”eat a beach” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar” words.

论文关键词:Statistical language modeling, sense disambiguation

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007537716579