Age and gender in language, emoji, and emoticon usage in instant messages

作者:

Highlights:

• We analyze a dataset of 309,229 WhatsApp instant messages (N = 226).

• We identify age- and gender-linked variations in emoji, emoticon, and language usage.

• We use machine learning algorithms to significantly predict age and gender.

• We identify the most predictive language features.

• We discuss implications for user privacy in instant messaging.

摘要

•We analyze a dataset of 309,229 WhatsApp instant messages (N = 226).•We identify age- and gender-linked variations in emoji, emoticon, and language usage.•We use machine learning algorithms to significantly predict age and gender.•We identify the most predictive language features.•We discuss implications for user privacy in instant messaging.

论文关键词:Age,Gender,Author profiling,Instant messages,Machine learning,Digital footprints

论文评审过程:Received 5 November 2020, Revised 16 August 2021, Accepted 17 August 2021, Available online 18 August 2021, Version of Record 1 October 2021.

论文官网地址:https://doi.org/10.1016/j.chb.2021.106990