Automatic detection of Long Method and God Class code smells through neural source code embeddings

作者:

Highlights:

• We compare machine learning approaches against heuristics for code smell detection.

• We use metrics and code embeddings as code representations for machine learning.

• We test the performance of smell detectors on the large manually labeled dataset.

• CuBERT code embeddings outperform all code smell detection alternatives.

• We perform an error analysis to discuss the advantages of the CuBERT approach.

摘要

•We compare machine learning approaches against heuristics for code smell detection.•We use metrics and code embeddings as code representations for machine learning.•We test the performance of smell detectors on the large manually labeled dataset.•CuBERT code embeddings outperform all code smell detection alternatives.•We perform an error analysis to discuss the advantages of the CuBERT approach.

论文关键词:Code smell detection,Neural source code embeddings,Code metrics,Machine learning,Software engineering

论文评审过程:Received 28 July 2021, Revised 9 May 2022, Accepted 15 May 2022, Available online 19 May 2022, Version of Record 22 May 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117607