Improving Recall of software defect prediction models using association mining

作者:

Highlights:

摘要

Use of software product metrics in defect prediction studies highlights the utility of these metrics. Public availability of software defect data based on the product metrics has resulted in the development of defect prediction models. These models experience a limitation in learning Defect-prone (D) modules because the available datasets are imbalanced. Most of the datasets are dominated by Not Defect-prone (ND) modules as compared to D modules. This affects the ability of classification models to learn the D modules more accurately. This paper presents an association mining based approach that allows the defect prediction models to learn D modules in imbalanced datasets. The proposed algorithm preprocesses data by setting specific metric values as missing and improves the prediction of D modules. The proposed algorithm has been evaluated using 5 public datasets. A Naive Bayes (NB) classifier has been developed before and after the proposed preprocessing. It has been shown that Recall of the classifier after the proposed preprocessing has improved. Stability of the approach has been tested by experimenting the algorithm with different number of bins. The results show that the algorithm has resulted in up to 40% performance gain.

论文关键词:Software defect prediction,Naive Bayes,PROMISE repository,Imbalanced data,Improving Recall,Association mining

论文评审过程:Received 10 January 2015, Revised 4 October 2015, Accepted 6 October 2015, Available online 23 October 2015, Version of Record 8 November 2015.

论文官网地址:https://doi.org/10.1016/j.knosys.2015.10.009