Random spatial subspace clustering | 数据学习(DataLearner)

摘要

Strong spatial or time correlation exists in many types of data, for example, the hyperspectral data acquired by a spectrometer scanning through rock samples from a drill hole. It is of practical interests to identify spatially continuous segments in a given data set where we know a priori that the samples are strongly correlated spatially. Recently, a novel method called spatial subspace clustering (SpatSC) was proposed to address this problem. However, due to the subspace learning nature of the SpatSC model, this method becomes intractable when the number of samples to be processed is very large. To alleviate computational intensity, we proposed a method called random spatial subspace clustering or RSSC for short. In RSSC, only a subset of data is segmented by SpatSC and an overall solution is obtained through propagation. This reduces the computational cost significantly. Yet a very important question to answer is to what extent the RSSC solution differs from that of SpatSC. In this paper, we analyse the propagation procedure and derive an average error rate of RSSC solution compared to SpatSC solution on the whole data set. The results show that the RSSC clustering result is close to SpatSC result under mild conditions. This provides a theoretic performance guarantee of RSSC. Our analysis also reveals the guided random sampling implemented by crude spatial clustering is crucial in improving RSSC results. We evaluate RSSC quantitatively on various data sets to assess its effectiveness under different settings. The results show that RSSC has similar performance to SpatSC as indicated by the theory while its computational cost is only a fraction of that of SpatSC.