Classification of weld flaws with imbalanced class data

作者:

Highlights:

摘要

This paper presents research results of our investigation of the imbalanced data problem in the classification of different types of weld flaws, a multi-class classification problem. The one-against-all scheme is adopted to carry out multi-class classification and three algorithms including minimum distance, nearest neighbors, and fuzzy nearest neighbors are employed as the classifiers. The effectiveness of 22 data preprocessing methods for dealing with imbalanced data is evaluated in terms of eight evaluation criteria to determine whether any method would emerge to dominate the others. The test results indicate that: (1) nearest neighbor classifiers outperform the minimum distance classifier; (2) some data preprocessing methods do not improve any criterion and they vary from one classifier to another; (3) the combination of using the AHC_KM data preprocessing method with the 1-NN classifier is the best because they together produce the best performance in six of eight evaluation criteria; and (4) the most difficult weld flaw type to recognize is crack.

论文关键词:Multi-class classification,One-against-all,Weld flaws,Imbalanced data,Minimum distance classifier,K nearest neighbors,Fuzzy k-nearest neighbors

论文评审过程:Available online 15 August 2007.

论文官网地址:https://doi.org/10.1016/j.eswa.2007.08.044