Grammar-compressed indexes with logarithmic search time

作者:

Highlights:

摘要

Let a text T[1..n] be the only string generated by a context-free grammar with g (terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of T, can be encoded using Glg⁡G bits. We introduce the first grammar-compressed index that uses O(Glg⁡n) bits (precisely, Glg⁡n+(2+ϵ)Glg⁡g for any constant ϵ>0) and can find the occ occurrences of patterns P[1..m] in time O((m2+occ)lg⁡G). We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.

论文关键词:Highly repetitive text collections,Grammar compression,Text indexing,Compressed full-text self-indices

论文评审过程:Received 25 March 2020, Revised 11 November 2020, Accepted 1 December 2020, Available online 30 December 2020, Version of Record 30 December 2020.

论文官网地址:https://doi.org/10.1016/j.jcss.2020.12.001