A novel similarity classifier with multiple ideal vectors based on k-means clustering

作者:

Highlights:

• Introduction of novel similarity classifier with multiple ideal vectors

• Ideal vectors are determined with k-means clustering and the jump method.

• The model is tested on three artificial and three real-world data sets.

• Novel classifier often significantly better than the standard similarity classifier.

摘要

In the literature, researchers and practitioners can find a manifold of algorithms to perform a classification task. The similarity classifier is one of the more recently suggested classification algorithms. In this paper, we suggest a novel similarity classifier with multiple ideal vectors per class that are generated with k-means clustering in combination with the jump method. Two approaches for pre-processing, via simple standardization and via principal component analysis in combination with the MAP test and Parallel Analysis, are presented. On the artificial data sets, the novel classifier with standardization and with transformation power Y = 1 for the jump method results in significantly higher mean classification accuracies than the standard classifier. The results of the artificial data sets demonstrate that in contrast to the standard similarity classifier, the novel approach has the ability to cope with more complex data structures. For the real-world credit data sets, the novel similarity classifier with standardization and Y = 1 achieves competitive results or even outperforms the k-nearest neighbour classifier, the Naive Bayes algorithm, decision trees, random forests and the standard similarity classifier.

论文关键词:Supervised classification,Jump method,Principal component analysis,MAP test,Parallel Analysis

论文评审过程:Received 19 October 2017, Revised 23 February 2018, Accepted 19 April 2018, Available online 24 April 2018, Version of Record 14 June 2018.

论文官网地址:https://doi.org/10.1016/j.dss.2018.04.003