Knowing what doesn't matter: exploiting the omission of irrelevant data

作者:

摘要

Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only conditionally irrelevant attribute values, i.e. values that are not needed to classify an instance, given the values of the other unblocked attributes. We first motivate and formalize this model of “superfluous-value blocking”, and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model—viz., decision trees and DNF formulae—are trivial to learn in this setting. We then extend this model to deal with (1) theory revision (i.e. modifying an existing formula); (2) blockers that occasionally include superfluous values or exclude required values; and (3) other corruptions of the training data.

论文关键词:Irrelevant values,Blocked attributes,Learnability,Decision trees,DNF,Diagnosis,Theory revision,Adversarial noise

论文评审过程:Available online 19 May 1998.

论文官网地址:https://doi.org/10.1016/S0004-3702(97)00048-9