On efficient conditioning of probabilistic relational databases

作者:

Highlights:

摘要

A probabilistic relational database is a probability distribution over a set of deterministic relational databases (namely, possible worlds). Efficient updating information in probabilistic databases is required in several applications, such as sensor networking and data cleaning. As a way to update a probabilistic database, conditioning refines the probability distribution of the possible worlds based on general knowledge, such as functional dependencies. The existing methods for conditioning are exponential over the number of variables in the probabilistic database for an arbitrary constraint. In this paper, a constraint-based conditioning framework is proposed, which solves the conditioning problem by considering only the variables in the given constraint. Then, we prove the correctness of our proposed approach and provide efficient algorithms for each step of the approach. Afterward, a pruning strategy that can significantly improve the efficiency of the constraint-based approach is proposed for the functional dependency constraints. Furthermore, for functional dependency constraints, a variable-elimination strategy that minimizes the number of generated variables can benefit the subsequent query processing. The experimental study shows that the constraint-based approach is more efficient than other approaches described in the literature. The effectiveness of the two optimization strategies for functional dependency constraints is also demonstrated in the experiment.

论文关键词:Probabilistic databases,Possible worlds,Conditioning,Functional dependency,Constraints

论文评审过程:Received 11 February 2015, Revised 10 October 2015, Accepted 15 October 2015, Available online 26 October 2015, Version of Record 11 December 2015.

论文官网地址:https://doi.org/10.1016/j.knosys.2015.10.017