Fast, scalable and geo-distributed PCA for big data analytics
作者:
Highlights:
• An efficient block-division approach for PCA on arbitrarily large dimensional data
• Highly scalable algorithm which avoids memory-overflow error for big data
• Fast and communication-efficient accumulation scheme in geo-distributed environment
• An optimized Spark implementation which is 10× more scalable and 1.1 − 42× faster
• 1.3 − 2.9× improvement in running time on geo-distributed environment
摘要
•An efficient block-division approach for PCA on arbitrarily large dimensional data•Highly scalable algorithm which avoids memory-overflow error for big data•Fast and communication-efficient accumulation scheme in geo-distributed environment•An optimized Spark implementation which is 10× more scalable and 1.1 − 42× faster•1.3 − 2.9× improvement in running time on geo-distributed environment
论文关键词:Big data,PCA,Dimensionality reduction,Geo-distributed algorithm
论文评审过程:Received 26 May 2019, Revised 19 November 2020, Accepted 28 December 2020, Available online 6 January 2021, Version of Record 15 January 2021.
论文官网地址:https://doi.org/10.1016/j.is.2020.101710