BEstream: Batch Capturing with Elliptic Function for One-Pass Data Stream Clustering

作者:

Highlights:

摘要

Tremendous data have been generated in forms of streaming data and various distributions in most applications in different areas such as business, science, engineering, and medicine. This creates a new problem of space and time complexities where the incoming data can overflow the memory of an analysing machine and the flow of data may contain some scattered portions of data from different clusters. This situation leads to the incorrect clustering results. The challenge of the clustering on streaming data is clustering the data which continuously growing, unstable, and non-existent from time to time. This paper proposed the concept of discard-after-cluster based on the structure of adaptive hyper-elliptic micro-cluster components. Instead of gradually including each datum into its true cluster, a newly proposed set of algorithms capture the data in forms streaming batch and identify the cluster afterwards. The number of micro-clusters can be increased or decreased according to the dynamical distribution of incoming data as well as the overlap conditions of micro-clusters. A set of new recursive functions for updating parameters, checking overlap conditions, removing micro-clusters, and merging micro-clusters after discarding previously clustered data were introduced. The proposed algorithm was tested on synthetic and real data sets. The elliptic-micro-cluster structure is more suitable for capturing data than the other structures in the compared previous methods. In addition, our method named BEstream showed the more efficient results than the previous data stream clustering algorithms based on the rand index and normalized mutual information measures.

论文关键词:Data stream clustering,One-pass learning,Elliptic-micro-cluster

论文评审过程:Received 5 March 2017, Revised 9 May 2018, Accepted 12 July 2018, Available online 18 July 2018, Version of Record 13 October 2018.

论文官网地址:https://doi.org/10.1016/j.datak.2018.07.002