Fast range query estimation by N-level tree histograms

作者:

Highlights:

摘要

Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP applications, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, have been given in the recent past. Besides bucket-based histograms (i.e., histograms whose construction is driven by the search of a “good” domain partition), there are different new histograms, characterized by more complex structures (like, for instance, wavelet-based histograms). This paper presents a new histogram, called nLT, belonging to the latter class. It is based on a hierarchical decomposition of the original data distribution kept in a full binary tree. This tree, containing a set of pre-computed hierarchical queries, uses bit saving for representing integer numbers, so that the reduced storage space allows us to increase the tree resolution and, consequently, its accuracy. Experimental comparison shows the superiority of nLT w.r.t. the state-of-the-art histograms.

论文关键词:Query optimization,Optimization and performance,Range query estimation,Histograms,Data reduction

论文评审过程:Received 5 April 2004, Accepted 19 April 2004, Available online 24 May 2004.

论文官网地址:https://doi.org/10.1016/j.datak.2004.04.004