Range query estimation with data skewness for top-k retrieval

作者:

Highlights:

• This paper extends the query-mapping method for top-k retrieval in a relational DB.

• Top-k retrieval finds a small set of approximate results for user specified values.

• Query-mapping involves converting a top-k query into a range query.

• Proposed method incorporates data skewness in cost-based query-mapping.

• Experiments show improved efficiency and robustness across parameters.

摘要

Top-k querying can significantly improve the performance of web-based business intelligence applications such as price comparison and product recommendation systems. Top-k retrieval involves finding a limited number of records in a relational database that are most similar to user-specified attribute-value pairs. This paper extends the cost-based query-mapping method for top-k retrieval by incorporating data skewness in range estimation. Experiments on real world and synthetic multi-attribute data sets show that incorporating data skewness provides a robust performance across different types of data sets, query sets, distance functions, and histograms.

论文关键词:Top-k query,Query-mapping,Query processing,Cost model,RDBMSs

论文评审过程:Received 10 March 2013, Revised 2 August 2013, Accepted 16 September 2013, Available online 23 September 2013.

论文官网地址:https://doi.org/10.1016/j.dss.2013.09.005