Checking determinism of XML Schema content models in optimal time

作者:

Highlights:

摘要

We consider the determinism checking of XML Schema content models, as required by the W3C Recommendation. We argue that currently applied solutions have flaws and make processors vulnerable to exponential resource needs by pathological schemas, and we help to eliminate this potential vulnerability of XML Schema based systems. XML Schema content models are essentially regular expressions extended with numeric occurrence indicators. A previously published polynomial-time solution to check the determinism of such expressions is improved to run in linear time, and the improved algorithm is implemented and evaluated experimentally. When compared to the corresponding method of a popular production-quality XML Schema processor, the new implementation runs orders of magnitude faster. Enhancing the solution to take further extensions of XML Schema into account without compromising its linear scalability is also discussed.

论文关键词:Regular expression,Numeric occurrence indicator,One-unambiguity,Weak determinism,Unique particle attribution,Java

论文评审过程:Received 21 January 2010, Revised 13 August 2010, Accepted 19 October 2010, Available online 26 October 2010.

论文官网地址:https://doi.org/10.1016/j.is.2010.10.001