A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter

Highlights：

• This research shows that training a deep learning model on a dataset that includes several types of low-quality tweet can be an efficient solution to filter such content on a real-time setting.

• Two embedding methods (word- and character-level) are compared for the task of classifying tweets in either a legitimate or low-quality class using a dataset (40,000 tweets) collected through this project.

• We also show that Twitter account can be efficiently classified into spam or genuine profile using only the textual data of its recent tweets and a deep learning model.

摘要

•This research shows that training a deep learning model on a dataset that includes several types of low-quality tweet can be an efficient solution to filter such content on a real-time setting.•Two embedding methods (word- and character-level) are compared for the task of classifying tweets in either a legitimate or low-quality class using a dataset (40,000 tweets) collected through this project.•We also show that Twitter account can be efficiently classified into spam or genuine profile using only the textual data of its recent tweets and a deep learning model.

论文关键词：Low-quality content in social networks,Spam accounts,Real-time detection system,Deep learning techniques

论文评审过程：Received 16 September 2020, Revised 2 February 2021, Accepted 7 February 2021, Available online 12 February 2021, Version of Record 19 February 2021.