On clustering categories of categorical predictors in generalized linear models

作者:

Highlights:

• The paper proposes a method to cluster categorical features in Generalized Linear Models.

• The proposed approach uses a numerical method guided by the learning performance.

• The underlying structure of the categories and their relationship is identified using proximity graphs.

• Complexity is reduced and accuracy results are competitive against benchmark one-hot encoding of categorical features.

摘要

•The paper proposes a method to cluster categorical features in Generalized Linear Models.•The proposed approach uses a numerical method guided by the learning performance.•The underlying structure of the categories and their relationship is identified using proximity graphs.•Complexity is reduced and accuracy results are competitive against benchmark one-hot encoding of categorical features.

论文关键词:Statistical learning,Interpretability,Greedy randomized adaptive search procedure,Proximity between categories

论文评审过程:Received 3 March 2020, Revised 11 February 2021, Accepted 17 May 2021, Available online 24 May 2021, Version of Record 3 June 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115245