Memory lower bounds for XPath evaluation over XML streams

作者:

Highlights:

摘要

We consider the XPath evaluation problem: Evaluate an XPath query Q on a streaming XML document D. We consider two versions of the problem: 1) Filtering Problem: Determine if there is a match for Q in D. 2) Node Selection Problem: Determine the set Q(D) of document nodes selected by Q. We consider Conjunctive XPath (CXPath) queries that involve only the child and descendant axes. Let d denote the depth of D, and n denote the number of location steps in Q. Bar-Yossef et al. (2007, 2005) [6], [7] presented lower bounds on the memory space required by any algorithm to solve these two problems. Their lower bounds apply to each query in a large subset of XPath, and are obtained (mostly) using nonrecursive (Q,D). In this paper, we present larger lower bounds for a different class of queries (namely, CXPath queries with independent predicates), on recursive (Q,D). One of our results is an Ω(n⋅maxcands(Q,D)) lower bound for the node selection problem, for a worst-case Q; maxcands(Q,D) is the maximum number of nodes of D that can be candidates for output, at any one instant. So, there is no algorithm for the node selection problem that uses O(f(d,|Q|)+maxcands(Q,D)) space, for any function f. This shows that some previously published algorithms are incorrect.

论文关键词:XML,XPath,Query evaluation,Stream processing,Lower bounds

论文评审过程:Received 4 January 2008, Revised 8 January 2010, Accepted 28 October 2010, Available online 3 November 2010.

论文官网地址:https://doi.org/10.1016/j.jcss.2010.10.004