Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

作者:Paul Rodriguez

摘要

The increased availability of text corpora and the growth of connectionism has stimulated a renewed interest in probabilistic models of language processing in computational linguistics and psycholinguistics. The Simple Recurrent Network (SRN) is an important connectionist model because it has the potential to learn temporal dependencies of unspecified length. In addition, many computational questions about the SRN's ability to learn dependencies between individual items extend to other models. This paper will report on experiments with an SRN trained on a large corpus and examine the ability of the network to learn bigrams, trigrams, etc., as a function of the size of the corpus. The performance is evaluated by an information theoretic measure of prediction (or guess) ranking and output vector entropy. With enough training and hidden units the SRN shows the ability to learn 5 and 6-gram dependencies, although learning an n-gram is contingent on its frequency and the relative frequency of other n-grams. In some cases, the network will learn relatively low frequency deep dependencies before relatively high frequency short ones if the deep dependencies do not require representational shifts in hidden unit space.

论文关键词:simple recurrent networks, n-gram modeling, entropy of English, connectionism

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1023864622883