Out-of-bag estimation of the optimal sample size in bagging

作者:

Highlights:

摘要

The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set mwor=n. Without-replacement methods typically use half samples mwr=n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built.

论文关键词:Bagging,Subagging,Bootstrap sampling,Subsampling,Optimal sampling ratio,Ensembles of classifiers,Decision trees

论文评审过程:Received 21 November 2008, Revised 21 March 2009, Accepted 17 May 2009, Available online 28 May 2009.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.05.010