Effect of dimensionality and estimation on the performance of gaussian classifiers

摘要

The measurement dimensionality, which maximizes the average (over possible training sets) probability of correct classification (Pcr), is investigated for the equiprobable two-class Gaussian problem with known common covariance matrix. The Bayes minimum error classification rule, in which the estimated (sample) mean vectors are used in place of the true mean vectors, is the classification rule considered. A basic question investigated is the variation, with dimensionality, in the Mahalanobis distance (between the underlying distributions) required to keep Pcr constant. Numerical results are plotted for several cases. Analytical results are obtained which relate the rate of variation of the Mahalanobis distance with dimensionality and the corresponding asymptotic behaviour of Pcr. Results for more highly structured problems, involving specific covariance matrices, show that in some cases increasing correlation between the measurements yields higher values of Pcr. Approximate expressions are derived relating Pcr dimensionality, training sample size and the structure of the underlying probability density.

论文关键词：Pattern classification,Multivariate Gaussian distribution,Average probability of correct classification,Bayes classification rule,Maximum likelihood estimation,Mahalanobis distance,Design sample,Peaking phenomenon,Optimum number of measurements,Correlation coefficient