Is cross-linguistic advert flaw detection in Wikipedia feasible? A multilingual-BERT-based transfer learning approach

作者:

Highlights:

• Introduce transfer learning for cross-linguistic Wikipedia advert detection.

• English Wikipedia samples can detect Non-English Wikipedia advert.

• Multi-lingual BERT is qualified for a cross-linguistic transfer learning encoder.

• Proposed fine-tuning transfer performs the best for different dataset scales.

摘要

•Introduce transfer learning for cross-linguistic Wikipedia advert detection.•English Wikipedia samples can detect Non-English Wikipedia advert.•Multi-lingual BERT is qualified for a cross-linguistic transfer learning encoder.•Proposed fine-tuning transfer performs the best for different dataset scales.

论文关键词:Wikipedia quality flaw,Cross-lingual transfer learning,Pretraining language model,Text classification

论文评审过程:Received 18 January 2022, Revised 7 June 2022, Accepted 24 June 2022, Available online 30 June 2022, Version of Record 8 July 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109330