An incremental algorithm for clustering spatial data streams: exploring temporal locality

作者:Ling-Yin Wei, Wen-Chih Peng

摘要

Clustering sensor data discovers useful information hidden in sensor networks. In sensor networks, a sensor has two types of attributes: a geographic attribute (i.e, its spatial location) and non-geographic attributes (e.g., sensed readings). Sensor data are periodically collected and viewed as spatial data streams, where a spatial data stream consists of a sequence of data points exhibiting attributes in both the geographic and non-geographic domains. Previous studies have developed a dual clustering problem for spatial data by considering similarity-connected relationships in both geographic and non-geographic domains. However, the clustering processes in stream environments are time-sensitive because of frequently updated sensor data. For sensor data, the readings from one sensor are similar for a period, and the readings refer to temporal locality features. Using the temporal locality features of the sensor data, this study proposes an incremental clustering (IC) algorithm to discover clusters efficiently. The IC algorithm comprises two phases: cluster prediction and cluster refinement. The first phase estimates the probability of two sensors belonging to a cluster from the previous clustering results. According to the estimation, a coarse clustering result is derived. The cluster refinement phase then refines the coarse result. This study evaluates the performance of the IC algorithm using synthetic and real datasets. Experimental results show that the IC algorithm outperforms exiting approaches confirming the scalability of the IC algorithm. In addition, the effect of temporal locality features on the IC algorithm is analyzed and thoroughly examined in the experiments.

论文关键词:Data mining, Dual clustering, Spatial data streams

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-013-0636-8