Detecting ethnicity-targeted hate speech in Russian social media texts

作者:

Highlights:

• We present a three-class instance-based approach to detect ethnicity-targeted hate speech in Russian social media texts;

• We show that ethnicity-targeted hate speech is more effectively addressed with the new three-class approach;

• In our task of instance-based ethnicity-targeted hate speech detection state-of-the-art deep learning models, while consistently outperforming classical machine learning models despite a relatively small dataset size, significantly benefit from a combination of linguistic and sentiment features with BERT pre-training and certain fine-tuning techniques;

• Deep learning models significantly benefit from specific ethnonym information added to text representation in instance-based ethnicity-targeted hate speech detection;

• We are making the RuEthnoHate dataset containing 5,5K social media texts, the first dataset annotated with ethnicity-targeted hate speech in Russian, available to the research community.

摘要

•We present a three-class instance-based approach to detect ethnicity-targeted hate speech in Russian social media texts;•We show that ethnicity-targeted hate speech is more effectively addressed with the new three-class approach;•In our task of instance-based ethnicity-targeted hate speech detection state-of-the-art deep learning models, while consistently outperforming classical machine learning models despite a relatively small dataset size, significantly benefit from a combination of linguistic and sentiment features with BERT pre-training and certain fine-tuning techniques;•Deep learning models significantly benefit from specific ethnonym information added to text representation in instance-based ethnicity-targeted hate speech detection;•We are making the RuEthnoHate dataset containing 5,5K social media texts, the first dataset annotated with ethnicity-targeted hate speech in Russian, available to the research community.

论文关键词:Hate speech detection,Ethnic hate,Russian language,Deep learning

论文评审过程:Received 30 October 2020, Revised 3 June 2021, Accepted 28 June 2021, Available online 21 July 2021, Version of Record 21 July 2021.

论文官网地址:https://doi.org/10.1016/j.ipm.2021.102674