Regression with small data sets: a case study using code surrogates in additive manufacturing

作者：Chandrika Kamath, Ya Ju Fan

摘要

There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a computer simulation is run using many different values of the input parameters and a regression model is built to relate the outputs of the simulation to the inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments, but the cost of running expensive simulations at many sample points can be high. In this paper, we use a problem from the domain of additive manufacturing to show that even with small data sets we can build good quality surrogates by appropriately selecting the input samples and the regression algorithm. Our work is broadly applicable to simulations in other domains and the ideas proposed can be used in time-constrained machine learning tasks, such as hyper-parameter optimization.

论文关键词：Regression, Sampling, Code surrogates, Small data sets

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10115-018-1174-1