d-k-min-wise independent family of hash functions

作者:

Highlights:

• A framework to exponentially improve space and time of min-wise based algorithms.

• Gets around the Ω(log⁡1/ϵ) lower bound needed by min-wise hash functions.

• Only a constant degree of independence is required for the space and time improvements.

• Demonstrates how to apply the framework for similarity and rarity over data stream.

摘要

•A framework to exponentially improve space and time of min-wise based algorithms.•Gets around the Ω(log⁡1/ϵ) lower bound needed by min-wise hash functions.•Only a constant degree of independence is required for the space and time improvements.•Demonstrates how to apply the framework for similarity and rarity over data stream.

论文关键词:min-wise hash functions,Data streams,Windowed data streams,Similarity,Rarity

论文评审过程:Received 19 October 2015, Revised 15 September 2016, Accepted 23 September 2016, Available online 11 October 2016, Version of Record 14 November 2016.

论文官网地址:https://doi.org/10.1016/j.jcss.2016.09.005