Information-theoretic selection of high-dimensional spectral features for structural recognition

摘要

Pattern recognition methods often deal with samples consisting of thousands of features. Therefore, the reduction of their dimensionality becomes crucial to make the data sets tractable. Feature selection techniques remove the irrelevant and noisy features and select a subset of features which describe better the samples and produce a better classification performance. In this paper, we propose a novel feature selection method for supervised classification within an information-theoretic framework. Mutual information is exploited for measuring the statistical relation between a subset of features and the class labels of the samples. Traditionally it has been measured for ranking single features; however, in most data sets the features are not independent and their combination provides much more information about the class than the sum of their individual prediction power. We analyze the use of different estimation methods which bypass the density estimation and estimate entropy and mutual information directly from the set of samples. These methods allow us to efficiently evaluate multivariate sets of thousands of features.Within this framework we experiment with spectral graph features extracted from 3D shapes. Most of the existing graph classification techniques rely on the graph attributes. We use unattributed graphs to show what is the contribution of each spectral feature to graph classification. Apart from succeeding to classify graphs from shapes relying only on their structure, we test to what extent the set of selected spectral features are robust to perturbations of the dataset.