Hardness of approximate two-level logic minimization and PAC learning with membership queries

摘要

Producing a small DNF expression consistent with given data is a classical problem in computer science that occurs in a number of forms and has numerous applications. We consider two standard variants of this problem. The first one is two-level logic minimization or finding a minimum DNF formula consistent with a given complete truth table (TT-MinDNF). This problem was formulated by Quine in 1952 and has been since one of the key problems in logic design. It was proved NP-complete by Masek in 1979. The best known polynomial approximation algorithm is based on a reduction to the SET-COVER problem and produces a DNF formula of size O(d⋅OPT), where d is the number of variables. We prove that TT-MinDNF is NP-hard to approximate within dγ for some constant γ>0, establishing the first inapproximability result for the problem. The other DNF minimization problem we consider is PAC learning of DNF expressions when the learning algorithm must output a DNF expression as its hypothesis (referred to as proper learning). We prove that DNF expressions are NP-hard to PAC learn properly even when the learner has access to membership queries, thereby answering a long-standing open question due to Valiant [L.G. Valiant, A theory of the learnable, Comm. ACM 27 (11) (1984) 1134–1142]. Finally, we provide a concrete connection between these variants of DNF minimization problem. Specifically, we prove that inapproximability of TT-MinDNF implies hardness results for restricted proper learning of DNF expressions with membership queries even when learning with respect to the uniform distribution only.