Effective social post classifiers on top of search interfaces

作者:Ryan Rivas, Vagelis Hristidis

摘要

Applying text classification to find social media posts relevant to a topic of interest is the focus of a substantial amount of research. A key challenge is how to select a good training set of posts to label. This problem has traditionally been solved using active learning. However, this assumes access to all posts of the collection, which is not realistic in many cases, as social networks impose constraints on the number of posts that can be retrieved through their search APIs. To address this problem, which we refer as the training post retrieval over constrained search interfaces problem, we propose several keyword selection algorithms that, given a topic, generate an effective set of keyword queries to submit to the search API. The returned posts are labeled and used as a training dataset to train post classifiers. Our experiments compare our proposed keyword selection algorithms to several baselines across various topics from three sources. The results show that the proposed methods generate superior training sets, which is measured by the balanced accuracy of the trained classifiers.

论文关键词:Text classification, Social media, Search interfaces, Data mining

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-021-00768-2