Performance evaluation of word-aligned compression methods for bitmap indices

作者:Gheorghi Guzun, Guadalupe Canahuate

摘要

Bitmap indices are a widely used scheme for large read-only repositories in data warehouses and scientific databases. This binary representation allows the use of bit-wise operations for fast query processing and is typically compressed using run-length encoding techniques. Most bitmap compression techniques are aligned using a fixed encoding length (32 or 64 bits) to avoid explicit decompression during query time. They have been proposed to extend or enhance word-aligned hybrid (WAH) compression. This paper presents a comparative study of four bitmap compression techniques: WAH, PLWAH, CONCISE, and EWAH. Experiments are targeted to identify the conditions under which each method should be applied and quantify the overhead incurred during query processing. Performance in terms of compression ratio and query time is evaluated over synthetic-generated bitmap indices, and results are validated over bitmap indices generated from real data sets. Different query optimizations are explored, query time estimation formulas are defined, and the conditions under which one method should be preferred over another are formalized.

论文关键词:Bitmap indices, Word-aligned compression, Data warehouses, Performance comparison and estimation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-015-0877-9