An efficient workload-based data layout scheme for multidimensional data

作者:

Highlights:

摘要

Physical data layout is a crucial factor in the performance of queries and updates in large data warehouses. Data layout enhances and complements other performance features such as materialized views and dynamic caching of aggregated results. Prior work has identified that the multidimensional nature of large data warehouses imposes natural restrictions on the query workload. A method based on a “uniform” query class approach has been proposed for data clustering and shown to be optimal. However, we believe that realistic query workloads will exhibit data access skew. For instance, if time is a dimension in the data model, then more queries are likely to focus on the most recent time interval. The query class approach does not adequately model the possibility of multidimensional data access skew. We propose the affinity graph model for capturing workload characteristics in the presence of access skew and describe an efficient algorithm for physical data layout. Our proposed algorithm considers declustering and load balancing issues which are inherent to the multidisk data layout problem. We demonstrate the validity of this approach experimentally.

论文关键词:Data warehousing,Data layout,Data clustering,Affinity graphs

论文评审过程:Received 19 June 2001, Revised 24 July 2001, Accepted 24 July 2001, Available online 18 October 2001.

论文官网地址:https://doi.org/10.1016/S0169-023X(01)00043-X