Comments on supervised feature selection by clustering using conditional mutual information-based distances

作者:

Highlights:

摘要

Supervised feature selection is an important problem in pattern recognition. Of the many methods introduced, those based on the mutual information and conditional mutual information measures are among the most widely adopted approaches. In this paper, we re-analyze an interesting paper on this topic recently published by Sotoca and Pla (Pattern Recognition, Vol. 43 Issue 6, June, 2010, pp. 2068–2081). In that work, a method for supervised feature selection based on clustering the features into groups is proposed, using a conditional mutual information based distance measure. The clustering procedure minimizes the objective function named the minimal relevant redundancy—mRR criterion. It is proposed that this objective function is the upper bound of the information loss when the full set of features is replaced by a smaller subset. We have found that their proof for this proposition is based on certain erroneous assumptions, and that the proposition itself is not true in general. In order to remedy the reported work, we characterize the specific conditions under which the assumptions used in the proof, and hence the proposition, hold true. It is our finding that there is a reasonable condition, namely when all features are independent given the class variable (as assumed by the popular naive Bayes classifier), under which the assumptions as required by Sotoca and Pla's framework hold true.

论文关键词:Feature selection,Conditional mutual information,Mutual information properties,Clustering,Classification,Naive Bayes classifier

论文评审过程:Received 27 February 2012, Revised 27 August 2012, Accepted 1 November 2012, Available online 7 November 2012.

论文官网地址:https://doi.org/10.1016/j.patcog.2012.11.001