Deception detection on social media: A source-based perspective

摘要

Fast, open, free, and accessible, online social networks are massively used to share news and various information. Unfortunately, their explosive growth amplifies the dissemination of misinformation, posing a severe threat to our societies. Nowadays, it is also a home ground for wrongdoers to spread fake news, rumors, conspiracies, hoaxes, and other forms of deception. Therefore, there is an urgent need to deploy efficient algorithms to tackle this infodemic. Current research focuses mainly on the news content or context to tell the difference between what is credible and what is not. Content-based methods concentrate on detecting the false knowledge the message carries in its writing style. Context-based methods rely on their propagation patterns or the credibility of their source. Finally, some hybrid methods combine these various types. This paper proposes a source-based method in a machine learning framework. It focuses on the profile and the interactions of the news spreaders. Thus, On Twitter, we associate each news article with the interaction network formed by the authors of tweets and their first-degree ego network. Experiments are conducted on two real-world datasets publicly available, covering different domains (Politifact, GossipCop). We select the most diversified news to assess the performance of various machine learning approaches. We conduct an extensive investigation to choose the best network and user-profile features and the most effective machine learning model that can classify news with the highest accuracy. The most effective set of features is well-balanced. It includes four network features and four user-profile parameters. Results show that the “XG Boost” model outperforms its alternatives. (Random Forest, Decision Tree, Multi-Layer Perceptron, K- Nearest Neighbor, Support Vector Machine). It achieves 92% and 91% accuracy on the Politifact and GossipCop datasets. Comparisons with baselines covering a large spectrum of solutions demonstrate its superiority. The proposed solution is promising. Indeed, it requires limited information with few news articles for training. Furthermore, it can detect deception in its initial stage.