Contrastive author-aware text clustering

作者:

Highlights:

• We study the author-aware text clustering problem by considering the author effect, which means the authors have narrow topic coverage and large topic concentration on their generated texts, supported by empirical analysis.

• We devise CAT, a novel contrastive learning based text clustering model and develop cluster-instance contrast and instance-instance contrast, wherein data augmentation and multi-view representations are simultaneously considered.

• Experiments conducted on three public datasets demonstrate the superiority of CAT among competing clustering methods, validating the benefits of considering the author’s effect.

摘要

•We study the author-aware text clustering problem by considering the author effect, which means the authors have narrow topic coverage and large topic concentration on their generated texts, supported by empirical analysis.•We devise CAT, a novel contrastive learning based text clustering model and develop cluster-instance contrast and instance-instance contrast, wherein data augmentation and multi-view representations are simultaneously considered.•Experiments conducted on three public datasets demonstrate the superiority of CAT among competing clustering methods, validating the benefits of considering the author’s effect.

论文关键词:Text clustering,Contrastive learning,Representation learning

论文评审过程:Received 10 January 2022, Revised 16 April 2022, Accepted 11 May 2022, Available online 12 May 2022, Version of Record 19 May 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108787