Ceasing hate with MoH: Hate Speech Detection in Hindi–English code-switched language

作者:

Highlights:

• Social media posts consist of Hindi–English code switched language (low resource language).

• Multilingual Bidirectional Encoder Representations from Transformers, BERT, and Multilingual Representations of Indian Languages, MuRIL, embeddings to capture the semantic sense of sentences and perform word sense disambiguation.

• Text classification of cyber hate and hate speech.

• Outperforming feature extraction techniques which use simple surface features.

• Overcoming challenges of Indic Trans character level transliterations.

摘要

•Social media posts consist of Hindi–English code switched language (low resource language).•Multilingual Bidirectional Encoder Representations from Transformers, BERT, and Multilingual Representations of Indian Languages, MuRIL, embeddings to capture the semantic sense of sentences and perform word sense disambiguation.•Text classification of cyber hate and hate speech.•Outperforming feature extraction techniques which use simple surface features.•Overcoming challenges of Indic Trans character level transliterations.

论文关键词:Cyber hate,Social media,Data simulations,Bert,MuRIL,Transfer learning,Text classification,And machine learning

论文评审过程:Received 21 January 2021, Revised 9 September 2021, Accepted 11 September 2021, Available online 9 November 2021, Version of Record 9 November 2021.

论文官网地址:https://doi.org/10.1016/j.ipm.2021.102760