slr-kit: A semi-supervised machine learning framework for systematic literature reviews

作者:

Highlights:

摘要

Systematic Literature Review (SLR) is nowadays a challenging task due to the large number of papers that typically compose the scientific material of the topic to review. Recently, a lot of research effort has been devoted to automate, even partially, the stages of an SLR. This paper proposes the design and implementation of a workflow and a set of tools – called slr-kit – to support key tasks in an SLR. The proposed approach leverages a semi-supervised strategy, in which time-consuming processes are carried out using automatic tools, whereas manual tasks have been optimized by carefully designed support tools to reduce the overall required effort. Important parts of the workflow include the extraction of key terms directly from the abstracts of the papers to survey, and the subsequent topic modeling that allows for a thematic clustering of the corpus of papers. In the proposed workflow, the former task is carried out by exploiting a novel tool, called FAst WOrd Classifier (FAWOC). The latter, instead, is designed to be automatically carried out by leveraging an ad-hoc solution based on the application of the Latent Dirichlet Allocation (LDA) algorithm. The result of the process consists in a set of statistics regarding the relationship among papers, topics, and their trend of publication on journals and conference proceedings. The validity of the method is demonstrated with an application to a dataset related to the scientific field of NLP, while its accuracy is assessed by the manual examination of the results by domain experts.

论文关键词:Natural language processing,Topic modeling,Systematic literature review,Tagging,Performance evaluation

论文评审过程:Received 11 April 2022, Revised 10 June 2022, Accepted 11 June 2022, Available online 17 June 2022, Version of Record 23 June 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109266