A novel domain and event adaptive tweet augmentation approach for enhancing the classification of crisis related tweets

作者:

Highlights:

摘要

One of the purposes of detecting the crisis related tweets is the ability to single out the tweets that provide information about the helps needed and offered. Classification of such tweets is difficult because of the unavailability of sufficient annotated tweets in those categories. To facilitate such classifications, a domain and event adaptive augmentation approach is proposed. The main objective of the research is to enhance the classification of crisis related tweets that have less training samples. The proposed algorithms are designed to integrate the innate domain and event information during the selection of words for augmentation. Components such as CrisisLex lexicon, Word2Vec embeddings and WordNet are utilized for the proposed augmentation. Experimentation is carried out to substantiate the benefits of augmentation. Results indicate increased performance of the classifier when provided with the expanded dataset including the augmented and original tweets. To combat the problem of overfitting and class imbalance arising due to the lesser training samples, a novel tweets augmentation algorithm can be utilized. The advantage in the proposed algorithms is the ability to retain the structure and inherent nature of the tweets during the augmentation.

论文关键词:Twitter analytics,Tweets augmentation,Deep learning,Crisis analytics,Text Augmentation

论文评审过程:Received 9 November 2020, Revised 28 May 2021, Accepted 13 July 2021, Available online 21 July 2021, Version of Record 27 September 2021.

论文官网地址:https://doi.org/10.1016/j.datak.2021.101913