Ultra-Sparse Classifiers Through Minimizing the VC Dimension in the Empirical Feature Space

摘要

Sparse representations have gained much interest due to the rapid growth of intelligent embedded systems, the need to reduce the time to mine large datasets, and for reducing the footprint of recognition based applications on portable devices. Computational learning theory tells us that the Vapnik–Chervonenkis (VC) dimension of a learning model directly impacts the structural risk and the generalization ability of a learning model. The minimal complexity machine (MCM) was recently proposed as a way to learn a hyperplane classifier by minimizing a tight bound on the VC dimension; results show that it learns very sparse representations that yield test set accuracies that are comparable to the state-of-the-art. The MCM formulation works in the primal itself, both when the classifier is learnt in the input space and when it is learnt implicitly in a higher dimensional feature space. In the latter case, the hyperplane is constructed in the empirical feature space (EFS). In this paper, we examine the hyperplane restricted to the EFS. The EFS is a finite dimensional vector space spanned by the image vectors in a higher dimensional feature space. Since the VC dimension of a linear hyperplane classifier is exactly the number of features used by the classifier, the dimension of the EFS is a direct measure of both the sparsity of the model and the VC dimension. This allows us to formulate optimization problems that focus on learning sparse representations, and yet generalize well. We derive an EFS version of the MCM, that allows us to minimize the model complexity and improve sparsity. We also propose a novel least squares version of the MCM in the EFS. Experimental results demonstrate that the EFS variants yield sparse models with generalization comparable to the state-of-the-art.