Is my stance the same as your stance? A cross validation study of stance detection datasets

作者:

Highlights:

• Cross-dataset stance detection models do not generalize well.

• Model generalizability can be improved by aggregating datasets.

• It is hard to ascertain amount of extra data for fine-tuning aggregated dataset models.

• Possible reasons for poor model performance/generalizability are that texts are not easily different iable by stances, nor are annotations consistent within/across datasets.

• Model performance differences due to indifferentiable texts and inconsistent stances.

摘要

•Cross-dataset stance detection models do not generalize well.•Model generalizability can be improved by aggregating datasets.•It is hard to ascertain amount of extra data for fine-tuning aggregated dataset models.•Possible reasons for poor model performance/generalizability are that texts are not easily different iable by stances, nor are annotations consistent within/across datasets.•Model performance differences due to indifferentiable texts and inconsistent stances.

论文关键词:Stance detection,Natural language processing,Cross validation,Machine learning,Twitter

论文评审过程:Received 10 July 2022, Revised 15 August 2022, Accepted 22 August 2022, Available online 5 September 2022, Version of Record 5 September 2022.

论文官网地址:https://doi.org/10.1016/j.ipm.2022.103070