An Experimental and Theoretical Comparison of Model Selection Methods

作者：Michael Kearns, Yishay Mansour, Andrew Y. Ng, Dana Ron

摘要

We investigate the problem of model selection in the setting of supervised learning of boolean functions from independent random examples. More precisely, we compare methods for finding a balance between the complexity of the hypothesis chosen and its observed error on a random training sample of limited size, when the goal is that of minimizing the resulting generalization error. We undertake a detailed comparison of three well-known model selection methods — a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV). We introduce a general class of model selection methods (called penalty-based methods) that includes both GRM and MDL, and provide general methods for analyzing such rules. We provide both controlled experimental evidence and formal theorems to support the following conclusions:

论文关键词：model selection, complexity regularization, cross validation, minimum description length principle, structural risk minimization, vc dimension

论文评审过程：

论文官网地址：https://doi.org/10.1023/A:1007344726582