A gradient ascent algorithm based on possibilistic fuzzy C-Means for clustering noisy data

作者:

Highlights:

摘要

Real-world data are often corrupted by noise and outliers, which are originated from different procedures such as data collection, storage, and processing. Noise and outliers decrease the quality of clustering and lead to the inaccurate and misplaced cluster centers. In this paper, we propose a new algorithm called Improved Possibilistic Fuzzy C-Means (IPFCM) to cluster noisy data. First, initial cluster centers are calculated by Possibilistic Fuzzy C-Means (PFCM) which do not match dense regions of the data. Then, the domain is divided to some subdomains and each data point is assigned to a sub-domain. The cluster centers are iteratively moved towards high-density regions by maximizing a novel cluster validity index. In the proposed method, a Gaussian membership function is defined on each cluster to weight the data. Then, the sum of weights in each cluster is calculated. The product of these values is considered as the validity index. Since division of the domain is changed with moving the cluster centers, this procedure is repeated until the convergent criterion is satisfied. Cluster analysis performed on six synthetics, nine real benchmarks datasets shows the superiority of IPFCM over some previous clustering algorithms such as Fuzzy C-Means (FCM), PFCM, Kernel Fuzzy C-Means (KFCM), Noise Clustering (NC), and Generalized Entropy based Possibilistic Fuzzy C-Means (GEPFCM). The clustering results of near-fault ground motion data indicate that the cluster centers identified by IPFCM are well separated from each other, while those for PFCM are close to each other in some datasets. Moreover, the results show that the impact of noisy data on the proposed index and consequently cluster analysis decreases as the noisy data get away from the cluster centers which is one of the advantages of using IPFCM algorithm.

论文关键词:Possibilistic C-means,Fuzzy clustering,Cluster validity index,Noisy data,Near-fault ground motion

论文评审过程:Received 17 February 2020, Revised 16 September 2021, Accepted 23 October 2021, Available online 16 November 2021, Version of Record 1 December 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.116153