Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis

作者:Amina Amara, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha

摘要

Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the benefits of social data analysis for the healthcare practices and curing domain. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited Twitter as source. In this paper, we choose to exploit Facebook, rarely used, for tracking the evolution of COVID-19 related trends. In fact, a multilingual dataset covering 7 languages (English (EN), Arabic (AR), Spanish (ES), Italian (IT), German (DE), French (FR) and Japanese (JP)) is extracted from Facebook public posts. The proposal is an analytics process including a data gathering step, pre-processing, LDA-based topic modeling and presentation module using graph structure. Data analysing covers the duration spanned from January 1st, 2020 to May 15, 2020 divided on three periods in cumulative way: first period January-February, second period March-April and the last one to 15 May. The results showed that the extracted topics correspond to the chronological development of what has been circulated around the pandemic and the measures that have been taken according to the various languages under discussion representing several countries.

论文关键词:Social media analysis, Covid-19, Topic modeling, Facebook, Data visualization, Multilingual

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-02033-3