An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes

作者:

Highlights:

• Proposing a fusion approach for retrieving candidates in cross-lingual plagiarism detection that combines a conceptual-based and keyword-based retrieval models.

• Providing a dynamic fusion measure that presents a specific interpolation factor for each sample of suspicious documents.

• Comprehensive assessment of the performance of conceptual models, especially the ESA model in the retrieval of candidates for cross-language plagiarism.

• Study the impact of using peak-and-plateau strategy as post processing on the output of the proposed retrieval model.

• Comparing the performance of the proposed model with the state-of-the-art approaches in cross-lingual candidate retrieval.

摘要

•Proposing a fusion approach for retrieving candidates in cross-lingual plagiarism detection that combines a conceptual-based and keyword-based retrieval models.•Providing a dynamic fusion measure that presents a specific interpolation factor for each sample of suspicious documents.•Comprehensive assessment of the performance of conceptual models, especially the ESA model in the retrieval of candidates for cross-language plagiarism.•Study the impact of using peak-and-plateau strategy as post processing on the output of the proposed retrieval model.•Comparing the performance of the proposed model with the state-of-the-art approaches in cross-lingual candidate retrieval.

论文关键词:Plagiarism detection,Cross-language plagiarism,Candidate retrieval,Conceptual model,Keyword-based model

论文评审过程:Received 16 November 2018, Revised 24 September 2019, Accepted 23 October 2019, Available online 1 November 2019, Version of Record 1 November 2019.

论文官网地址:https://doi.org/10.1016/j.ipm.2019.102150