ApproxCCA: An approximate correlation analysis algorithm for multidimensional data streams

作者:

Highlights:

摘要

Correlation analysis is regarded as a significant challenge in the mining of multidimensional data streams. Great emphasis is generally placed on one-dimensional data streams with the existing correlation analysis methods for the mining of data streams. Therefore, the identification of underlying correlation among multivariate arrays (e.g. Sensor data) has long been ignored. The technique of canonical correlation analysis (CCA) has rarely been applied in multidimensional data streams. In this study, a novel correlation analysis algorithm based on CCA, called ApproxCCA, is proposed to explore the correlations between two multidimensional data streams in the environment with limited resources. By introducing techniques of unequal probability sampling and low-rank approximation to reduce the dimensionality of the product matrix composed by the sample covariance matrix and sample variance matrix, ApproxCCA successfully improves computational efficiency while ensuring the analytical precision. Experimental results of synthetic and real data sets have indicated that the computational bottleneck of traditional CCA can be overcome with ApproxCCA, and the correlations between two multidimensional data streams can also be detected accurately.

论文关键词:Multidimensional data streams,Canonical correlation analysis,Probability and statistics,Approximation,Unequal probability sampling

论文评审过程:Received 15 December 2009, Revised 21 March 2011, Accepted 5 April 2011, Available online 13 April 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.04.003