A generic and customizable framework for the design of ETL scenarios

作者:

Highlights:

摘要

Extraction–transformation–loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we delve into the logical design of ETL scenarios and provide a generic and customizable framework in order to support the DW designer in his task. First, we present a metamodel particularly customized for the definition of ETL activities. We follow a workflow-like approach, where the output of a certain activity can either be stored persistently or passed to a subsequent activity. Also, we employ a declarative database programming language, LDL, to define the semantics of each activity. The metamodel is generic enough to capture any possible ETL activity. Nevertheless, in the pursuit of higher reusability and flexibility, we specialize the set of our generic metamodel constructs with a palette of frequently used ETL activities, which we call templates. Moreover, in order to achieve a uniform extensibility mechanism for this library of built-ins, we have to deal with specific language issues. Therefore, we also discuss the mechanics of template instantiation to concrete activities. The design concepts that we introduce have been implemented in a tool, arktos ii, which is also presented.

论文关键词:Data warehousing,ETL

论文评审过程:Available online 7 December 2004.

论文官网地址:https://doi.org/10.1016/j.is.2004.11.002