The complexity of theory revision | 数据学习(DataLearner)

摘要

A knowledge-based system uses its database (also known as its “theory”) to produce answers to the queries it receives. Unfortunately, these answers may be incorrect if the underlying theory is faulty. Standard “theory revision” systems use a given set of “labeled queries” (each a query paired with its correct answer) to transform the given theory, by adding and/or deleting either rules and/or antecedents, into a related theory that is as accurate as possible. After formally defining the theory revision task, this paper provides both sample and computational complexity bounds for this process. It first specifies the number of labeled queries necessary to identify a revised theory whose error is close to minimal with high probability. It then considers the computational complexity of finding this best theory, and proves that, unless P = NP, no polynomial-time algorithm can identify this optimal revision, even given the exact distribution of queries, except in certain simple situations. It also shows that, except in such simple situations, no polynomial-time algorithm can produce a theory whose error is even close to (i.e., within a particular polynomial factor of) optimal. The first (sample complexity) results suggest reasons why theory revision can be more effective than learning from scratch, while the second (computational complexity) results explain many aspects of the standard theory revision systems, including the practice of hill-climbing to a locally-optimal theory, based on a given set of labeled queries.