Two-stage optimization for machine learning workflow

作者：

Highlights：

• The importance of optimizing data pipeline over hyperparameter tuning is studied.

• The results show data pipelines are often more important than hyperparameter tuning.

• A two-stage optimization process is proposed to search for a ML workflow.

• This process is empirically validated over several time allocation policies.

• Iterative and adaptive policies are more robust than static policies.

• A metric to measure if a data pipeline is independent from the model is proposed.

摘要

•The importance of optimizing data pipeline over hyperparameter tuning is studied.•The results show data pipelines are often more important than hyperparameter tuning.•A two-stage optimization process is proposed to search for a ML workflow.•This process is empirically validated over several time allocation policies.•Iterative and adaptive policies are more robust than static policies.•A metric to measure if a data pipeline is independent from the model is proposed.

论文关键词：Data pipelines,Hyperparameter tuning,AutoML,CASH

论文评审过程：Received 30 June 2019, Revised 16 October 2019, Accepted 3 December 2019, Available online 9 December 2019, Version of Record 10 June 2020.

论文官网地址：https://doi.org/10.1016/j.is.2019.101483