Reasoning about Strings in Databases | 数据学习(DataLearner)

摘要

In order to enable the database programmer to reason about relations over strings of arbitrary length, we introduce alignment calculus, a modal extension of the relational calculus. In addition to relations, a state in the model consists of a two-dimensional array where the strings are aligned on top of each other. The basic modality in the language (a transpose, or “slide”) rearranges this alignment, and more complex formulae can be formed using a syntax reminiscent of regular expressions, in addition to the usual connectives and quantifiers. It turns out that the computational counterpart of the string-based portion of the logic is the class of multitape two-way finite state automata, which are devices particularly well suited for the implementation of string matching. A computational counterpart of the full logic is obtained from relational algebra by performing selection with these devices. Safety of formulae in alignment calculus implies that new strings generated from old ones have to be of bounded length. While an undecidable property in general, this boundedness is decidable for an important subclass of formulae. As far as expressive power is concerned, alignment calculus includes previous proposals for querying string databases and gives full Turing computability. The language can be restricted to define exactly the regular sets and the sets in each level of the polynomial-time hierarchy above P.