n-Grams and their implication to natural language understanding

作者:

Highlights:

摘要

This paper presents the results of a comparative information-theoretic study that was carried out between Greek and English texts. The rank frequency correlation (p) between the two appears to be very high; the correlation is between 0.915 and 0.989. The results also include positional letter analyses, n-gram analyses, word analyses, empirical semantic correlations between the Greek and English n-grams, and entropy calculations. The findings presented here are of interest to researchers in the fields of natural language understanding, text processing and compression, speech synthesis and recognition as well as error detection and correction. The results are interesting because they encompass the complete range of hierarchic text patterns (i.e. letters, n-grams (or sub-word patterns) and words).

论文关键词:n-Grams,Text analysis,Word frequencies,Entropy of language,Natural language analysis,Text patterns

论文评审过程:Received 10 February 1989, Revised 9 June 1989, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(90)90072-S