Pasting Small Votes for Classification in Large Databases and On-Line

作者:Leo Breiman

摘要

Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procedure takes small pieces of the data, grows a predictor on each small piece and then pastes these predictors together. A version is given that scales up to terabyte data sets. The methods are also applicable to on-line learning.

论文关键词:combining, database, votes, pasting

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007563306331