QETL: An approach to on-demand ETL from non-owned data sources

作者:

Highlights:

摘要

In traditional OLAP systems, the ETL process loads all available data in the data warehouse before users start querying them. In some cases, this may be either inconvenient (because data are supplied from a provider for a fee) or unfeasible (because of their size); on the other hand, directly launching each analysis query on source data would not enable data reuse, leading to poor performance and high costs. The alternative investigated in this paper is that of fetching and storing data on-demand, i.e., as they are needed during the analysis process. In this direction we propose the Query-Extract-Transform-Load (QETL) paradigm to feed a multidimensional cube; the idea is to fetch facts from the source data provider, load them into the cube only when they are needed to answer some OLAP query, and drop them when some free space is needed to load other facts. Remarkably, QETL includes an optimization step to cheaply extract the required data based on the specific features of the data provider. The experimental tests, made on a real case study in the genomics area, show that QETL effectively reuses data to cut extraction costs, thus leading to significant performance improvements.

论文关键词:On-demand ETL,Incremental loading,OLAP

论文评审过程:Received 28 April 2016, Revised 15 September 2017, Accepted 16 September 2017, Available online 25 September 2017, Version of Record 13 November 2017.

论文官网地址:https://doi.org/10.1016/j.datak.2017.09.002