Data pre-processing pipeline generation for AutoETL

作者:

Highlights:

• A study on the impact of pre-processing over a set of classification algorithms.

• A method to generate effective pre-processing pipeline prototypes.

• A method for automatic pipeline instantiation as a step towards AutoETL.

• A meta-learning approach to warm-start the pipeline instantiation.

• A comprehensive set of experiments that show the effectiveness of the proposed method.

摘要

•A study on the impact of pre-processing over a set of classification algorithms.•A method to generate effective pre-processing pipeline prototypes.•A method for automatic pipeline instantiation as a step towards AutoETL.•A meta-learning approach to warm-start the pipeline instantiation.•A comprehensive set of experiments that show the effectiveness of the proposed method.

论文关键词:Data pre-processing pipelines,Data analytics

论文评审过程:Received 28 June 2021, Revised 30 September 2021, Accepted 13 November 2021, Available online 3 December 2021, Version of Record 12 May 2022.

论文官网地址:https://doi.org/10.1016/j.is.2021.101957