Covering Cubes by Random Half Cubes, with Applications to Binary Neural Networks

摘要

LetQnbe the (hyper)cube {−1, 1}n. This paper is concerned with the following question: How many vectors must be chosen uniformly and independently at random fromQnbefore every vector inQnitself has negative inner product with at least one of the random vectors? For any fixedε>0, a simple expectation argument shows that for all sufficiently largen, (1+ε) nrandom vectors suffice with high probability. In this paper we prove that there areε, ρ>0 such that (1−ε) nrandom vectors are also enough and such that at leastρnrandom vectors are necessary. This problem is partially motivated by neural network problems. Neural networks are being used to solve a growing number of difficult problems such as speech recognition, handwriting recognition, and protein structure prediction. Recently, for both theoretical and practical reasons, neural networks with binary weights (binary neural networks) have attracted much attention. In spite of considerable analysis based on statistical mechanics, the following two basic questions about binary neural networks have remained unanswered. Q1. Is there a positive constantρsuch that for all sufficiently largenthere is a binary neural network ofnneurons which can separateρn(unbiased) random patterns with probability close to 1? Q2. Is it possible for a binary neural network ofnneurons to separate (1−o(1)) nrandom patterns with probability greater than some positive constant? (Hereo(1) goes to 0 asngoes to infinity.) This question is raised because no binary neural network ofnneurons can separate more than n random patterns with probability close to 1. Our results yield the answers “YES” to Q1 and “NO” to Q2.