Effective semi-supervised document clustering via active learning with instance-level constraints

作者:Weizhong Zhao, Qing He, Huifang Ma, Zhongzhi Shi

摘要

Semi-supervised document clustering, which takes into account limited supervised data to group unlabeled documents into clusters, has received significant interest recently. Because of getting supervised data may be expensive, it is important to get most informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm and a new method for actively selecting informative instance-level constraints to get improved clustering performance. The semi- supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN) algorithm, which incorporates instance-level constraints to guide the clustering process in DBSCAN. An active learning approach is proposed to select informative document pairs for obtaining user feedbacks. Experimental results show that Cons-DBSCAN with our proposed active learning approach can improve the clustering performance significantly when given a relatively small amount of constraints.

论文关键词:Semi-supervised clustering, Document clustering, Active learning, Instance-level constraint

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-011-0389-1