Multilingual author profiling on Facebook

作者:

Highlights:

• Proposed a multilingual (Roman Urdu and English) author profiling corpus of Facebook profiles.

• Manually developed a bilingual dictionary (Roman Urdu to English) of 7749 entries and translated multilingual corpus using it.

• Applied 64 stylometry and 11 content based features on multilingual and translated corpora.

• Best results obtained using word bigram for age and word unigram, character 3 and 8 gram for gender identification.

摘要

•Proposed a multilingual (Roman Urdu and English) author profiling corpus of Facebook profiles.•Manually developed a bilingual dictionary (Roman Urdu to English) of 7749 entries and translated multilingual corpus using it.•Applied 64 stylometry and 11 content based features on multilingual and translated corpora.•Best results obtained using word bigram for age and word unigram, character 3 and 8 gram for gender identification.

论文关键词:Authorship,Author profiling,Multilingual corpus,Facebook,Roman Urdu,Stylometry,N-gram

论文评审过程:Received 4 July 2016, Revised 17 March 2017, Accepted 27 March 2017, Available online 12 April 2017, Version of Record 12 April 2017.

论文官网地址:https://doi.org/10.1016/j.ipm.2017.03.005