Monte Carlo comparisons of selected clustering procedures

作者:

Highlights:

摘要

Monte Carlo methods were used to estimate the percent misclassification of 13 clustering methods for six types of parameterizations of two bivariate normal populations. The clustering methods were compared by using the probabilities of misclassification and incidence matrices. It was determined that correlations and differences in population sizes adversely influenced all clustering methods, where differences in the variance structure did not appreciably affect the results. The k-means partitioning method was the overall best method. Considering only agglomerative methods, the sum of squares, variance, furthest neighbor and rank score methods were generally superior to the other non-partitioning methods considered. The overall poorest methods were judged to be nearest neighbor and maximum likelihood. However, as the complexity of the distributions increased, the differences between all of the methods decreased.

论文关键词:Monte Carlo simulation,Clustering methods,Discriminant function,Multivariate analysis,Probabilities of misclassification

论文评审过程:Received 12 March 1979, Revised 6 August 1979, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(80)90002-3