A novel virtual sample generation method based on Gaussian distribution

作者:

Highlights:

摘要

Traditional machine learning algorithms are not with satisfying generalization ability on noisy, imbalanced, and small sample training set. In this work, a novel virtual sample generation (VSG) method based on Gaussian distribution is proposed. Firstly, the method determines the mean and the standard error of Gaussian distribution. Then, virtual samples can be generated by such Gaussian distribution. Finally, a new training set is constructed by adding the virtual samples to the original training set. This work has shown that training on the new training set is equivalent to a form of regularization regarding small sample problems, or cost-sensitive learning regarding imbalanced sample problems. Experiments show that given a suitable number of virtual sample replicates, the generalization ability of the classifiers on the new training sets can be better than that on the original training sets.

论文关键词:Virtual sample,Regularization theory,Cost-sensitive learning,Gaussian distribution,Prior knowledge

论文评审过程:Received 15 January 2010, Revised 20 December 2010, Accepted 25 December 2010, Available online 31 December 2010.

论文官网地址:https://doi.org/10.1016/j.knosys.2010.12.010