ADC: Advanced document clustering using contextualized representations

作者:

Highlights:

• A document clustering framework that leverages contextualized vectors is proposed.

• Informative representations for documents are extracted from pre-trained models.

• A partial optimization and centroid update is proposed in the clustering module.

• The proposed method outperforms the baselines in several datasets for clustering.

• The effect of clustering method and embeddings are explored in various experiments.

摘要

•A document clustering framework that leverages contextualized vectors is proposed.•Informative representations for documents are extracted from pre-trained models.•A partial optimization and centroid update is proposed in the clustering module.•The proposed method outperforms the baselines in several datasets for clustering.•The effect of clustering method and embeddings are explored in various experiments.

论文关键词:Natural language processing,Document clustering,Contextualized representations,Cosine similarity,Deep clustering

论文评审过程:Received 5 March 2019, Revised 28 June 2019, Accepted 28 June 2019, Available online 29 June 2019, Version of Record 8 July 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.06.068