Clusterability assessment for Gaussian mixture models

作者:

Highlights:

摘要

There are numerous measures designed to evaluate quality of a given data grouping in terms of its distinctness and between-cluster separation. However, there seems to be no efficient method to assess distinctness of the intrinsic structure within data (clusterability) before actual clustering is determined. Based on recent findings, we propose such measure in terms of covariance matrix decomposition for appropriately transformed data. The data is assumed to come from a Gaussian mixture model. The transformation reshapes the data so that unsupervised technique of principal component analysis is able to uncover information directly indicative of the data clusterability characteristics. In this work we propose the measure and explain the motivation as well as the relation to supervised structure distinctness coefficients. We also show how the measure can be applied for number of clusters and feature selection tasks.

论文关键词:Clusterability,Gaussian mixture models,Fisher’s discriminant,Principal component analysis

论文评审过程:Available online 13 February 2015.

论文官网地址:https://doi.org/10.1016/j.amc.2014.12.038