A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry

作者:

Highlights:

• We study the impact of data preparation on customer churn prediction performance.

• Effective data preparation improves AUC up to 14.5% and top decile lift up to 34%.

• Optimized logistic regression is competitive with advanced data mining algorithms.

摘要

Data preparation is a process that aims to convert independent (categorical and continuous) variables into a form appropriate for further analysis. We examine data-preparation alternatives to enhance the prediction performance for the commonly-used logit model. This study, conducted in a churn prediction modeling context, benchmarks an optimized logit model against eight state-of-the-art data mining techniques that use standard input data, including real-world cross-sectional data from a large European telecommunication provider. The results lead to following conclusions. (i) Analysts better acknowledge that the data-preparation technique they choose actually affects churn prediction performance; we find improvements of up to 14.5% in the area under the receiving operating characteristics curve and 34% in the top decile lift. (ii) The enhanced logistic regression also is competitive with more advanced single and ensemble data mining algorithms. This article concludes with some managerial implications and suggestions for further research, including evidence of the generalizability of the results for other business settings.

论文关键词:Predictive analytics,Data preparation techniques,Churn prediction

论文评审过程:Received 12 April 2016, Revised 24 November 2016, Accepted 27 November 2016, Available online 29 November 2016, Version of Record 3 March 2017.

论文官网地址:https://doi.org/10.1016/j.dss.2016.11.007