Mining user-generated content in an online smoking cessation community to identify smoking status: A machine learning approach

作者:

Highlights:

• Analyze large-scale user-generated data in an online health community for behavior change

• Identify individual authors' smoking status (quit vs not) with machine learning

• Include novel features beyond content of focal posts that improve classifier performance

• Incorporate technical innovations based upon insights from domain experts

摘要

Online smoking cessation communities help hundreds of thousands of smokers quit smoking and stay abstinent each year. Content shared by users of such communities may contain important information that could enable more effective and personally tailored cessation treatment recommendations. This study demonstrates a novel approach to determine individuals' smoking status by applying machine learning techniques to classify user-generated content in an online cessation community. Study data were from BecomeAnEX.org, a large, online smoking cessation community. We extracted three types of novel features from a post: domain-specific features, author-based features, and thread-based features. These features helped to improve the smoking status identification (quit vs. not) performance by 9.7% compared to using only text features of a post's content. In other words, knowledge from domain experts, data regarding the post author's patterns of online engagement, and other community member reactions to the post can help to determine the focal post author's smoking status, over and above the actual content of a focal post. We demonstrated that machine learning methods can be applied to user-generated data from online cessation communities to validly and reliably discern important user characteristics, which could aid decision support on intervention tailoring.

论文关键词:Machine learning,Text mining,Smoking cessation,Online community,Social network

论文评审过程:Received 20 June 2018, Revised 11 September 2018, Accepted 10 October 2018, Available online 15 October 2018, Version of Record 17 November 2018.

论文官网地址:https://doi.org/10.1016/j.dss.2018.10.005