Vacillatory learning of nearly minimal size grammars

摘要

In Gold's influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold's paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n > 1 distinct, but equivalent, correct grammars. He showed that larger classes of languages can be algorithmically learned (in the limit) by converging to up to n + 1 rather than up to n correct grammars. He also argued that, for “small” n>1, it is plausible that people might sometimes converge to vacillating between up to n grammars. The insistence on small n was motivated by the consideration that, for “large” n, at least one of n grammars would be too large to fit in people's heads. Of course, even for Gold's n = 1 case, the single grammar converged to in the limit may be infeasibly large. An interesting complexity restriction to make, then, on the final grammar(s) converged to in the limit is that they all have small size. In this paper we study some of the trade-offs in learning power involved in making a well-defined version of this restriction. We show and exploit as a tool the desirable property that the learning power under our size-restricted criteria (for successful learning) is independent of the underlying acceptable programming systems. We characterize the power of our size-restricted criteria and use this characterization to prove that some classes of languages, which can be learned by converging in the limit to up to n + 1 nearly minimal size correct grammars, cannot be learned by converging to up to n unrestricted grammars even if these latter grammars are allowed to have a finite number of anomalies (i.e., mistakes) per grammar. We also show that there is no loss of learning power in demanding that the final grammars be nearly minimal size iff one is willing to tolerate an unbounded, finite number of anomalies in the final grammars and there is a constant bound on the number of different grammars converged to in the limit. Hence, if we allow an unbounded, finite number of anomalies in the final grammars and the number of different grammars converged to in the limit is unbounded but finite (or if there is a constant bound on the number of anomalies allowed in the final grammars), then there is a loss of learning power in requiring that the final grammars be nearly minimal size. These results do not always match what might be expected from the cases, previously examined by Freivalds, Kinber, and Chen, of learning nearly minimal size programs for functions.