A connected network-regularized logistic regression model for feature selection

作者:Lingyu Li, Zhi-Ping Liu

摘要

Feature selection on a network structure can not only discover interesting variables but also mine out their intricate interactions. Regularization is often employed to ensure the sparsity and smoothness of the coefficients in logistic regression. However, currently available methods fail to embed the network connectivity in regularized penalty functions. In this paper, a connected network-regularized logistic regression (CNet-RLR) model for feature selection considering the structural connectivity in a network was proposed. Mathematically, it was a convex optimization problem constrained by inequalities reflecting network connectivity. Considering the non-differentiability of Lasso penalty, we constructed an equivalent formulation of CNet-RLR by employing auxiliary variables. An interior-point algorithm was designed to efficiently achieve the solutions. Theoretically, we proved their grouping effect and oracle property and guaranteed algorithmic convergence. In both synthetic simulation data and real-world uterine corpus endometrial carcinoma (UCEC) cancer genomics data, we validated the CNet-RLR model is efficient to identify the connected-network-structured features that can serve as diagnostic biomarkers. In the comparison study, we also proved the proposed CNet-RLR model results in better classification performance and feature interpretability than the other regularized logistic regression (RLR) alternatives and another graph embedded feature selection model.

论文关键词:Regularized logistic regression, Feature selection, Network-based sparse penalty, Network connectivity, Biomarker discovery

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02877-3