Efficient computation of deletion-robust k-coverage queries

作者:Jiping Zheng, Xingnan Huang, Yuan Ma

摘要

Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.

论文关键词: k-coverage queries, Representative skyline, Sieving procedure, Robust-Coreset, Chain structure

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-020-01540-6