Asking Questions to Minimize Errors | 数据学习(DataLearner)

摘要

A number of efficient learning algorithms achieve exact identification of an unknown function from some class using membership and equivalence queries. Using a standard transformation such algorithms can easily be converted to on-line learning algorithms that use membership queries. Under such a transformation the number of equivalence queries made by the query algorithm directly corresponds to the number of mistakes made by the on-line algorithm. In this paper we consider several of the natural classes known to be learnable in this setting, and investigate the minimum number of equivalence queries with accompanying counterexamples (or equivalently the minimum number of mistakes in the on-line model) that can be made by a learning algorithm that makes a polynomial number of membership queries and uses polynomial computation time. We are able both to reduce the number of equivalence queries used by the previous algorithms and often to prove matching lower bounds. As an example, consider the class of DNF formulas over n variables with at mostk=O(log n) terms. Previously, the algorithm of Blum and Rudich provided the best known upper bound of 2O(k) log nfor the minimum number of equivalence queries needed for exact identification. We greatly improve on this upper bound showing that exactlykcounterexamples are needed if the learner knowska priori and exactlyk+1 counterexamples are needed if the learner does not knowka priori. This exactly matches known lower bounds of Bshouty and Cleve. For many of our results we obtain a complete characterization of the trade-off between the number of membership and equivalence queries needed for exact identification. The classes we consider here are monotone DNF formulas, Horn sentences,O(log n)-term DNF formulas, read-ksat-jDNF formulas, read-once formulas over various bases, and deterministic finite automata.