Just-in-time defect prediction based on AST change embedding

作者:

Highlights:

摘要

Just-in-time (JIT) defect prediction can help developers quickly identify whether a change is defective or not. The features extracted from changes play an essential role in building an accurate prediction model. In recent years, it has been considered effective to extract the semantic features of software code files by using code representation technology. However, how to extract semantic information from broken changing code snippets is still a challenging problem. We propose a new feature to represent code semantics based on Abstract Syntax Trees (ASTs), called ACE (AST Change Embedding), by comparing the abstract syntax tree of source code before and after a change and extracting AST change sequences, and then mapping them into numeric vectors by using word embedding technology. At the same time, we utilize the gated mechanism to build a gated hierarchical model, called GH-ACE, to combine the traditional manual features and semantic features. We conduct experiments on within-project and cross-project defect prediction tasks and evaluate the effectiveness of our proposed model in non-effort-aware scenarios and effort-aware scenarios. The results show that, on average, our model is 4.0 percent higher than the best baseline method for within-project defect prediction and 2.4 percent higher than the best baseline method for cross-project defect prediction.

论文关键词:Just-in-time defect prediction,Code representation,Abstract syntax tree,Word embedding

论文评审过程:Received 28 November 2021, Revised 14 April 2022, Accepted 15 April 2022, Available online 25 April 2022, Version of Record 10 May 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108852