Subspace multi-clustering: a review | 数据学习(DataLearner)

摘要

Clustering has been widely used to identify possible structures in data and help users to understand data in an unsupervised manner. Traditional clustering methods often provide a single partitioning of the data that groups similar data objects in one group while separates dissimilar ones into different groups. However, it has been well recognized that assuming only a single clustering for a data set can be too strict and cannot capture the diversity in applications. Multiple clustering structures can be hidden in the data, and each represents a unique perspective of the data. Different multi-clustering methods, which aim to discover multiple independent structures hidden in data, have been proposed in the recent decade. Although multi-clustering methods may provide more information for users, it is still challenging for users to efficiently and effectively understand each clustering structure. Subspace multi-clustering methods address this challenge by providing each clustering a feature subspace. Moreover, most subspace multi-clustering methods are especially scalable for high-dimensional data, which has become more and more popular in real applications due to the advances of big data technologies. In this paper, we focus on the subject of subspace multi-clustering, which has not been reviewed by any previous survey. We formulate the subspace multi-clustering problem and categorize the methodologies in different perspectives (e.g., de-coupled methods and coupled methods). We compare different methods on a series of specific properties (e.g., input parameters and different kinds of subspaces) and analyze the advantages and disadvantages. We also discuss several interesting and meaningful future directions.