Vulnerable community identification using hate speech detection on social media

Highlights：

• In summary, our proposed approach has the following contributions:○Our proposed approach investigates the application of hate speech detection approach to vulnerable community identification.○We successfully identify a potentially vulnerable community in terms of hatred on social media, by using the example of Amharic text data on Facebook.○We collected and annotated Amharic data for the task of hate speech detection, aligned with multicultural societies like Ethiopia.○We utilize Apache Spark distributed platform for data pre-processing and feature extraction since social media data is very noisy and large that needs efficient tools to facilitate efficient processing.

摘要

•In summary, our proposed approach has the following contributions:○Our proposed approach investigates the application of hate speech detection approach to vulnerable community identification.○We successfully identify a potentially vulnerable community in terms of hatred on social media, by using the example of Amharic text data on Facebook.○We collected and annotated Amharic data for the task of hate speech detection, aligned with multicultural societies like Ethiopia.○We utilize Apache Spark distributed platform for data pre-processing and feature extraction since social media data is very noisy and large that needs efficient tools to facilitate efficient processing.

论文关键词：Vulnerable community identification,Data annotation,Amharic text processing,Hate speech detection,Spark distributed framework

论文评审过程：Received 30 November 2018, Revised 16 July 2019, Accepted 16 July 2019, Available online 23 July 2019, Version of Record 16 March 2020.