Universal indexes for highly repetitive document collections
作者:
Highlights:
• We study how existing indexes perform in highly repetitive document collections.
• We design new inverted index variants for this kind of collections.
• We implement, adapt, and/or tune existing self-indexes for this case.
• We obtain significant space reductions, at a moderate price in query time.
• We obtain larger reductions on self-indexes, but at a higher price in query time.
摘要
Highlights•We study how existing indexes perform in highly repetitive document collections.•We design new inverted index variants for this kind of collections.•We implement, adapt, and/or tune existing self-indexes for this case.•We obtain significant space reductions, at a moderate price in query time.•We obtain larger reductions on self-indexes, but at a higher price in query time.
论文关键词:Repetitive collections,Inverted index,Self-index
论文评审过程:Received 22 March 2016, Revised 8 April 2016, Accepted 9 April 2016, Available online 27 April 2016, Version of Record 19 May 2016.
论文官网地址:https://doi.org/10.1016/j.is.2016.04.002