Subsampling for partial least-squares regression via an influence function

作者:

Highlights:

摘要

Partial least squares (PLS) performs well for high-dimensional regression problems, where the number of predictors can far exceed the number of observations. Similar to many other supervised learning techniques, PLS was developed in the framework of empirical risk minimization, which typically assumes that the test and training data are drawn from the same distribution. Any violation of this assumption can deteriorate the PLS performance. Subsampling via an influence function is a recently developed and promising technique for addressing this problem. However, influence functions are only guaranteed to be accurate for sufficiently small changes to the model, limiting their application to small-scale datasets. To overcome this obstacle, a new form of the influence function for PLS is derived, and a framework of subsampling via an influence function for PLS is developed. Compared with the classic PLS and two other subsampling frameworks, the results on four simulation datasets and two real-world datasets illustrate the effectiveness of our method.

论文关键词:Influence function,Partial least squares,Distribution drift,Subsampling

论文评审过程:Received 10 May 2021, Revised 25 February 2022, Accepted 23 March 2022, Available online 29 March 2022, Version of Record 8 April 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108661