Software defect prediction based on correlation weighted class association rule mining

作者:

Highlights:

摘要

Software defect prediction based on supervised learning plays a crucial role in guiding software testing for resource allocation. In particular, it is worth noticing that using associative classification with high accuracy and comprehensibility can predict defects. But owing to the imbalance data distribution inherent, it is easy to generate a large number of non-defective class association rules, but the defective class association rules are easily ignored. Furthermore, classical associative classification algorithms mainly measure the interestingness of rules by the occurrence frequency, such as support and confidence, without considering the importance of features, resulting in combinations of the insignificant frequent itemset. This promotes the generation of weighted associative classification. However, the feature weighting based on domain knowledge is subjective and unsuitable for a high dimensional dataset. Hence, we present a novel software defect prediction model based on correlation weighted class association rule mining (CWCAR). It leverages a multi-weighted supports-based framework rather than the traditional support-confidence approach to handle class imbalance and utilizes the correlation-based heuristic approach to assign feature weight. Besides, we also optimize the ranking, pruning and prediction stages based on weighted support. Results show that CWCAR is significantly superior to state-of-the-art classifiers in terms of Balance, MCC, and Gmean.

论文关键词:Software defect prediction,Associative classification,Class imbalance,Attribute weighting,Apriori,Association rule

论文评审过程:Received 25 November 2019, Revised 22 February 2020, Accepted 6 March 2020, Available online 23 March 2020, Version of Record 16 April 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.105742