Constraint Score: A new filter method for feature selection with pairwise constraints

作者:

Highlights:

摘要

Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form of supervision information for feature selection, i.e. pairwise constraints, which specifies whether a pair of data samples belong to the same class (must-link constraints) or different classes (cannot-link constraints). Pairwise constraints arise naturally in many tasks and are more practical and inexpensive than class labels. This topic has not yet been addressed in feature selection research. We call our pairwise constraints guided feature selection algorithm as Constraint Score and compare it with the well-known Fisher Score and Laplacian Score algorithms. Experiments are carried out on several high-dimensional UCI and face data sets. Experimental results show that, with very few pairwise constraints, Constraint Score achieves similar or even higher performance than Fisher Score with full class labels on the whole training data, and significantly outperforms Laplacian Score.

论文关键词:Feature selection,Pairwise constraints,Filter method,Constraint Score,Fisher Score,Laplacian Score

论文评审过程:Received 17 April 2007, Revised 9 October 2007, Accepted 12 October 2007, Available online 17 October 2007.

论文官网地址:https://doi.org/10.1016/j.patcog.2007.10.009