Learning from Batched Data: Model Combination Versus Data Combination

作者：Kai Ming Ting, Boon Toh Low, Ian H. Witten

摘要

Combining models learned from multiple batches of data provide an alternative to the common practice of learning one model from all the available data (i.e. the data combination approach). This paper empirically examines the base-line behavior of the model combination approach in this multiple-data-batches scenario. We find that model combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used. In the beginning of the curve, model combination has higher bias and variance than data combination and thus a higher error rate. As training data increases, model combination has either a lower error rate than or a comparable performance to data combination because the former achieves larger variance reduction. We also show that this result is not sensitive to the methods of model combination employed. Another interesting result is that we empirically show that the near-asymptotic performance of a single model in some classification tasks can be significantly improved by combining multiple models (derived from the same algorithm) in the multiple-data-batches scenario.

论文关键词：Model combination, data combination, empirical evaluation, learning curve, near-asymptotic performance

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF03325092