Single-pass and linear-time k-means clustering based on MapReduce
作者:
Highlights:
• mrk-means is a novel clustering algorithm which is based on MapReduce.
• mrk-means is single-pass and linear-time.
• mrk-means results in clusters that are O(log2k)-competitive to optimal solution.
• mrk-means is both faster and more accurate than Apache Mahout and GraphLab k-means.
摘要
Highlights•mrk-means is a novel clustering algorithm which is based on MapReduce.•mrk-means is single-pass and linear-time.•mrk-means results in clusters that are O(log2k)-competitive to optimal solution.•mrk-means is both faster and more accurate than Apache Mahout and GraphLab k-means.
论文关键词:Distributed k-means,Data clustering,MapReduce-based clustering
论文评审过程:Received 25 May 2014, Accepted 23 February 2016, Available online 4 March 2016, Version of Record 23 March 2016.
论文官网地址:https://doi.org/10.1016/j.is.2016.02.007