Robust approach for estimating probabilities in Naïve–Bayes Classifier for gene expression data

作者:

Highlights:

摘要

Naïve–Bayes Classifier (NBC) is widely used for classification in machine learning. It is considered as the first choice for many classification problems because of its simplicity and classification accuracy as compared to other supervised learning methods. However, for high dimensional data like gene expression data, it does not perform well due to two major limitations i.e. underflow and overfitting. In order to address the problem of underflow, the existing approach adopted is to add the logarithms of probabilities rather than multiplying probabilities and the estimate approach is used for providing solution to overfitting problem. However, in practice for gene expression data, these approaches do not perform well. In this paper, a novel approach has been proposed to overcome the limitations using a robust function for estimating probabilities in Naïve–Bayes Classifier. The proposed method not only resolves the limitation of NBC but also improves the classification accuracy for gene expression data. The method has been tested over several benchmark gene expression datasets of high dimension. Comparative results of proposed Robust Naïve–Bayes Classifier (R-NBC) and existing NBC for gene expression data have also been illustrated to highlight the effectiveness of the R-NBC. Simulation study has also been performed to depict the robustness of the R-NBC over the existing approaches.

论文关键词:Classification,Naïve–Bayes,Gene expression data

论文评审过程:Available online 13 July 2010.

论文官网地址:https://doi.org/10.1016/j.eswa.2010.06.076