Probabilistic object deputy model for uncertain data and lineage management

作者:

Highlights:

摘要

Lineage is important in uncertain data management since it can be used for finding out which part of data contributes to a result and computing the probability of the result. Nonetheless, the existing works consider an uncertain tuple as a set of tuples that can be stored in a relational table. Lineage can derive each tuple in the table, with which one can only find out the tuples rather than specific attributes that contribute to the result. If uncertain tuples have multiple uncertain attributes, for a result tuple with low probability, users cannot know which attribute is the main cause of it. In this paper, we propose an approach to model uncertain data. Compared with the alternative way based on the relational model, our model achieves a low maintenance cost and avoids a large number of redundant storage and join operations. Based on our model, some operations are defined for querying data, generating lineage, computing probability and derivation of results. Further, we discuss how to correctly compute probability with lineage and an algorithm is proposed to transform lineage for correct probability computation. We also discuss how to realize result derivation with the lineage. Experiments show the advantages of the proposed model on uncertain data management.

论文关键词:Uncertain data,Data modeling,Lineage,Probability computation

论文评审过程:Received 2 March 2017, Revised 2 March 2017, Accepted 2 March 2017, Available online 18 March 2017, Version of Record 31 May 2017.

论文官网地址:https://doi.org/10.1016/j.datak.2017.03.005