A mathematical programming approach for integrated multiple linear regression subset selection and validation

作者:

Highlights:

• A mixed-integer quadratic programming-based framework that integrates subset selection and model validation is proposed.

• The proposed model finds the best subset in multiple linear regression that minimizes the mean squared error and satisfies all considered diagnostics simultaneously.

• An efficient cut for relaxed coefficient t-tests that significantly reduces solution time is proposed.

• An algorithmic approach is proposed to find a quality alternative subset when every subset violates the proposed validations.

摘要

•A mixed-integer quadratic programming-based framework that integrates subset selection and model validation is proposed.•The proposed model finds the best subset in multiple linear regression that minimizes the mean squared error and satisfies all considered diagnostics simultaneously.•An efficient cut for relaxed coefficient t-tests that significantly reduces solution time is proposed.•An algorithmic approach is proposed to find a quality alternative subset when every subset violates the proposed validations.

论文关键词:Regression diagnostics,Subset selection,Mathematical programming

论文评审过程:Received 21 February 2020, Revised 9 July 2020, Accepted 24 July 2020, Available online 25 July 2020, Version of Record 5 August 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107565