Is my stance the same as your stance? A cross validation study of stance detection datasets

作者：

Highlights：

• Cross-dataset stance detection models do not generalize well.

• Model generalizability can be improved by aggregating datasets.

• It is hard to ascertain amount of extra data for fine-tuning aggregated dataset models.

• Possible reasons for poor model performance/generalizability are that texts are not easily different iable by stances, nor are annotations consistent within/across datasets.

• Model performance differences due to indifferentiable texts and inconsistent stances.

摘要

•Cross-dataset stance detection models do not generalize well.•Model generalizability can be improved by aggregating datasets.•It is hard to ascertain amount of extra data for fine-tuning aggregated dataset models.•Possible reasons for poor model performance/generalizability are that texts are not easily different iable by stances, nor are annotations consistent within/across datasets.•Model performance differences due to indifferentiable texts and inconsistent stances.

论文关键词：Stance detection,Natural language processing,Cross validation,Machine learning,Twitter

论文评审过程：Received 10 July 2022, Revised 15 August 2022, Accepted 22 August 2022, Available online 5 September 2022, Version of Record 5 September 2022.

论文官网地址：https://doi.org/10.1016/j.ipm.2022.103070